Six Fuzzy Morphology Methods for Roles of Combines Standard Sentences Persian language
Subject Areas : Computer EngineeringHaniye Rezaei 1 , Homayun Motameni 2 , Behnam Barzegar 3
1 -
2 -
3 -
Keywords: Fuzzy System, Bi-gram, difuzzifire, independent roles, dependent roles,
Abstract :
Ability to remove ambiguities is one of the main characteristics of the fuzzy systems in resolving the problems like NLP and morphology. On the other hand, most of the conducted studies in the field of morphology of Farsi language are dealing with analysis of words by means of HMM statistical method. Therefore, this paper has conducted the statistical morphology in the role of words in a sentence in the two sets of independent roles like: “Subject, adverb, possess, possessive, subject, subject header, appositive, conjunct, conjunctive, exclamation and proclaimed” using Fuzzy system. Also, in this Fuzzy system, the Max (Product) Fuzzifier by Bi-gram labeling was used. In addition, regarding the importance of defuzzifier of step one, in all fuzzy systems, for the first time, 6 types of defuzzifiers of ‘Maximum membership, center of gravity, weighted average, mean of maximum, smallest of maximum, largest of maximum, center average’ were implemented and obtained results have shown that the Center of gravity defuzzifier method with the mean of 63.698% had better results in comparison with other defuzzifer methods.
[1] H. Motameni and A. Peykar, "Morphology of Compounds as Standard Words in Persian through Hidden Markov Model and Fuzzy Method," Journal of Intelligent & Fuzzy Systems, vol. 30, no. 3, pp. 1567-1580, 2016.
[2] T. Chadza, K. G. Kyriakopoulos, S. Lambotharan, Analysis of hidden markov model learning algo- rithms for the detection and prediction of multi-stage network attacks, Future generation computer systems 108 (2020) 636–649.
[3] J. Wettig, S. Hiltunen and R. Yangarber, "Hidden Markov Models for induction of morphological structure of natural language," Department of Computer Science,University of Helsinki, Finland, Helsinki, 2010.
[4] S. Naderi Parizi, "Implementation of hidden Markov models associated with the ability to apply the language, grammar, search methods and the applicability of the model," Amirkabir University of Technology, Department of Computer Engineering and Information Technology, Tehran, 2007.
[5] C. C. Aggarwal, Data Mining, Switzerland: Springer International, 2015.
[6] M. Mohseni and B. Minaei-bidgoli, "A Persian Part-Of-Speech Tagger Based on Morphological Analysis," in Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta, 2010.
[7] W. Khan, A. Daud, K. Khan, J. A. Nasir, M. Basheri, N. Aljohani, F. S. Alotaibi, Part of speech tagging in urdu: Comparison of machine and deep learning approaches, IEEE Access 7 (2019) 38918–38936.
[8] P. Koehn, Statistical Machine Translation, New York: United States of America by Cambridge University Press, 2010, pp. 181-212.
[9] M. Shrivastava, "Hindi POS Tagger Using Naive Stemming : Harnessing Morphological Information Without Extensive Linguistic Knowledge," in ICON-2008:6th International Conference on Natural Language Processing,Macmillan Publishers., India, 2008.
[10] M. Bahrani, H. Sameti, N. Hafezi and S. Momtazi, "A New Word Clustering Method for Building N-Gram Language Models in Continuous Speech Recognition Systems," in New Frontiers in Applied Artificial Intelligence, 21st International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE, Wroclaw, Poland, 2008.
[11] A. F. Alajmi, E. M. Saad and M. H. Awadalla, "Hidden markov model based Arabic morphological analyzer," International Journal of Computer Engineering Research, vol. 2, no. 2, pp. 28-33, 2011.
[12] D. Dutta, S. Halder, T. Gayen, Intelligent part of speech tagger for hindi, Procedia Computer Science 218 (2023) 604–611
[13] T.-L. Tseng, F. Jiang, Y. Kwon, Hybrid type ii fuzzy system & data mining approach for surface finish, Journal of Computational Design and Engineering 2 (3) (2015) 137–147.
[14] A. Chiche, B. Yitagesu, Part of speech tagging: a systematic review of deep learning and machine learning approaches, Journal of Big Data 9 (1) (2022) 1–25.
[15] D. Modi and N. Nain, "Part-of-Speech Tagging of Hindi Corpus Using Rule-Based Method," Proceedings of the International Conference on Recent Cognizance in Wireless Communication & Image Processing, Vols. 10.1007/978-81-322-2638-3, no. 28, pp. 241-247, 2016.
[16] J. S. Rohl, "A note on Backus Naur form," Department of Computer Science, The University, Manchester 13, 2010.
[17] S. Poria, E. Cambria, G. Winterstein and G.-B. Huang, "Sentic patterns: Dependency based rules for concept-level sentiment analysis.," Knowledge-Based Systems., 2014.
[18] M. R. Costa-Juss`a, M. Farr´us, J. B. Mari˜no and J. A. Fonollosa, "Study and comparison of rule-based and statistical catalan-spanish machine translation systems," Computing and Informatics, vol. 31, pp. 245-270, 2012.
[19] A. Alnaied, M. Elbendak, A. Bulbul, An intelligent use of stemmer and morphology analysis for arabic information retrieval, Egyptian Informatics Journal 21 (4) (2020) 209–217.
[20] K. Darwish, "Building a shallow morphological analyzer in one day.," in ACL-02 Workshop on Computational Approaches to Semitic Languages, Philadelphia, PA, 2002.
[21] K. Taghva, R. Elkhoury and J. S Coombs, "Arabic stemming without a root dictionary.," in Information Technology: Coding and Computing.ITCC 2005, 2005.
[22] A. Mohamad, A.-S. Riyad and K. Ghass, "Building an Effective Rule-Based Light Stemmer for Arabic Language to Improve Search Effectiveness.," International Arab Journal of Information Technology (IAJIT), vol. 9, no. 4, pp. 368-372, 2012.
[23] T. Buckwalter, "Buckwalter Arabic Morphological Analyzer.," the Linguistic Data Consortium,, Pennsylvania, 2002.
[24] H. K. Al Ameed, S. O. Al Ketbi, A. A. Al Kaabi, K. S. Al Shebli, N. F. Al Shamsi, N. H. Al Nuaimi and S. S. Al Muhairi, "Arabic light stemmer: anew enhanced approach," in The Second International Conference on Innovations in Information Technology (IIT’05), 2005.
[25] L. Larkey, L. Ballesteros and M. Connel, "Improving Stemming for Arabic Information Retrieval: Light Stemming and Co-occurrence Analysis.," in 25th annual international ACM SIGIR conference on Research and development in information retrieval, 2002.
[26] A. El-Hajar, M. Hajar and K. Zreik, "A System for Evaluation of Arabic Root Extraction Methods.," in fifth international Conference on Internet and Web Applications and Services., 2010.
[27] H.Alshalabi,S.Tiun,N.Omar,F.N.AL-Aswadi,K.A.Alezabi,Arabiclight-basedstemmer usingnewrules,JournalofKingSaudUniversity-ComputerandInformationSciences34(9)(2022) 6635–6642
[28] M. F. Kabir, K. Abdullah-Al-Mamun, M. N. Huda, Deep learning based parts of speech tagger for bengali, in: 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV), IEEE, 2016, pp. 26–29.
[29] K. K. Zin, N. Thein, Hidden markov model with rule based approach for part of speech tagging of myanmar language, in: Proceedings of 3rd International Conference on Communications and Information, 2009, pp. 123–128.
[30] F. Pisceldo, M. Adriani, R. Manurung, Probabilistic part of speech tagging for bahasa indonesia, in: Third international MALINDO workshop, 2009, pp. 1–6.
[31] M. Attia, Y. Samih, A. Elkahky, H. Mubarak, A. Abdelali, K. Darwish, Pos tagging for improving code-switching identification in arabic, in: Proceedings of the Fourth Arabic Natural Language Processing Workshop, 2019, pp. 18–29.
[32] F. Pisceldo, M. Adriani, R. Manurung, Probabilistic part of speech tagging for bahasa indonesia, in: Third international MALINDO workshop, 2009, pp. 1–6.
[33] M. Febryanto, I. Sulyaningsih, A. A. Zhafirah, Analysis of translation techniques and quality of translated terms of mechanical engineering in accredited national journals, Professional Journal of English Education 1 (2021) 116–119.
[34] A. Xv, Russian-english bidirectional machine translation system, in: Proceedings of the Fifth Con- ference on Machine Translation, 2020, pp. 320–325.
[35] H. Aldarmaki, A. Ullah, S. Ram, N. Zaki, Unsupervised automatic speech recognition: A review, Speech Communication 139 (2022) 76–91.
[36] M. Creutz, "Unsupervised segmentation of words using prior distributions of morph length and frequency," in Proc. 41st Meeting of ACL, Sapporo,Japan, 2003.
[37] F. Ahmed and A. Nürnberger, "N-grams Conflation Approach for Arabic," in ACM SIGIR Conference, Amsterdam, 2007.
[38] A. Y. Muaad, G. H. Kumar, J. Hanumanthappa, J. B. Benifa, M. N. Mourya, C. Chola,
M. Pramodha, R. Bhairava, An effective approach for arabic document classification using machine learning, Global Transitions Proceedings 3 (1) (2022) 267–271.
[39] K. Tnaji, K. Bouzoubaa, S. L. Aouragh, A light arabic pos tagger using a hybrid approach, in: Digital Technologies and Applications: Proceedings of ICDTA 21, Fez, Morocco, Springer, 2021, pp. 199–208.
[40] M. El-Hadj, A.-S. IA and A.-A. AM, "Arabic Part of Speech Tagging Using the Sentence Structure.," in 2nd international Conference on Arabic Language Resources & Tools, Cairo, 2009.
[41] H. Hassani, Part of speech tagging (post) of a low-resource language using another language (devel- oping a pos-tagged lexicon for kurdish (sorani) using a tagged persian (farsi) corpus), CoRR (2022) abs/2201.12793.
[42] S. Alqrainy, M. Alawairdhi, Towards developing a comprehensive tag set for the arabic language, Journal of Intelligent Systems 30 (1) (2020) 287–296.
[43] H. Motameni, A. Ebrahimnejad, J. Vahidi, et al., Morphology of composition functions in persian sentences through a newly proposed classified fuzzy method and center of gravity defuzzification method, Journal of Intelligent & Fuzzy Systems 36 (6) (2019) 5463–5473.
[44] S. M. Assi and M. Haji Abdolhosseini, "Grammatical tagging of a Farsi Corpus.," International Journal of Corpus Linguistics., vol. 5, no. 1, pp. 69-81, 2000.
[45] M. Bijankhan, J. Sheykhzadegan, M. Bahrani and M. Ghayoomi, "Lessons from Building a Persian Written Corpus: Peykare," Language Resources and Evaluation, vol. 45, pp. 143-164, 2011.
[46] M. Shamsfard, H. Sadat Jafari and M. Ilbe, "STeP-1: A Set of Fundamental Tools for Persian Text Processing.," in LREC 2010, Valletta, Malt, 2010.
[47] H. T.-P., L. K.-Y. and W. S.-L., "Mining linguistic browsing patterns in the world wide web," Soft Computing, vol. 5, pp. 329-336, 2002.
[48] F. M. Zanzotto, L. Dell’Arciprete, A. Moschitti, Efficient graph kernels for textual entailment recog- nition, Fundamenta Informaticae 107 (2-3) (2011) 199–222.
[49] N. Passalis, J. Raitoharju, A. Tefas, M. Gabbouj, Efficient adaptive inference for deep convolutional neural networks using hierarchical early exits, Pattern Recognition 105 (2020) 107346.
[50] Z. Elaggoune, R. Maamri, I. Boussebough, A fuzzy agent approach for smart data extraction in big data environments, Journal of King Saud University-Computer and Information Sciences 32 (4) (2020) 465–478.
[51] M. E. Cintra, M. C. Monard, H. A. Camargo, A fuzzy decision tree algorithm based on c4. 5, Mathware & Soft Computing 20 (1) (2013) 56–62.
[52] X. Sun, L. Yuan, M. Liu, S. Liang, D. Li, L. Liu, Quantitative estimation for the impact of mining activities on vegetation phenology and identifying its controlling factors from sentinel-2 time series, International Journal of Applied Earth Observation and Geoinformation 111 (2022) 102814.
[53] X. Bai, Y. Yang, Fuzzy decision tree algorithm based on feature value’s class contribution level, Iranian Journal of Fuzzy Systems 19 (4) (2022) 73–88.
[54] S. Sayami, S. Shakya, Nepali pos tagging using deep learning approaches, NU. International Journal of Science 17 (2) (2020) 69–84.
[55] A. Krassimir, "Intuitionistic fuzzy logics as tools for evaluation of Data Mining processes," 25th anniversary of Knowledge-Based Systems, vol. 80, pp. 122-130, 2015.
[56] M. Moniri, "Fuzzy and Intuitionistic Fuzzy Turing Machines," Fundamenta Informaticae, vol. 123, no. 3, pp. 305-315, 2013.
[57] C. Rahul, T. Arathi, L. S. Panicker, R. Gopikakumari, Morphology & word sense disambiguation em- bedded multimodal neural machine translation system between sanskrit and malayalam, Biomedical Signal Processing and Control 85 (2023) 105051.
[58] A. Bria, W. Faber and N. Leone, "Normal Form Nested Programs," Fundamenta Informaticae, vol. 96, no. 3, pp. 271-295, 2009.
[59] R. Jayashree, S. K. Murthy, K. Sunny, Keyword extraction based summarization of categorized kannada text documents, International Journal on Soft Computing 2 (4) (2011) 81.
[60] C. Gupta, A. Jain, N. Joshi, Fuzzy logic in natural language processing–a closer view, Procedia computer science 132 (2018) 1375–1384.
[61] G. Chen, T. T. Pham, N. Boustany, Introduction to fuzzy sets, fuzzy logic, and fuzzy control systems, Applied Mechanics Reviews 54 (6) (2001) B102–B103.
[62] H. Englund, H. Stockhult, S. Du Rietz, A. Nilsson, G. Wennblom, Learning-environment uncertainty and students’ approaches to learning: A self-determination theory perspective, Scandinavian Journal of Educational Research (2022) 1–15.
[63] T. Chen, An innovative fuzzy and artificial neural network approach for forecasting yield under an uncertain learning environment, Journal of Ambient Intelligence and Humanized Computing 9 (2018) 1013–1025. .
[64] A. Chiche, B. Yitagesu, Part of speech tagging: a systematic review of deep learning and machine learning approaches, Journal of Big Data 9 (1) (2022) 1–25.
[65] C. Marsala, B. Bouchon-Meunier, Fuzzy data mining and management of interpretable and subjective information, Fuzzy Sets and Systems 281 (2015) 252–259.
[66] C. Marsala and B. Bouchon-Meunier, "Fuzzy data mining and management of interpretable and subjective information," Fuzzy Sets and Systems, vol. 281, no. Special Issue Celebrating the 50th Anniversary of Fuzzy Sets, p. 252–259, 2015.
[67] F. M. Zanzotto, L. Dell'Arciprete and A. Moschitti, "Efficient Graph Kernels for Textual Entailment Recognition," Fundamenta Informaticae, vol. Moschitti, no. 2-3, pp. 199-222, 2011.
[68] T.-L. Tseng, F. Jiang and Y. Kwon, "Hybrid Type II fuzzy system & datamining approach for surface finish," Journal of Computational Design and Engineering, vol. 2, no. 3, pp. 137-147, 2015.
[69] E. J. Khatib, R. Barco, A. Gómez-Andrades, P. Muñoz and I. Serrano, "Data mining for fuzzy diagnosis systems in LTE networks," Expert Systems with Applications, vol. 42, no. 21, p. 7549–7559, 2015.
[70] A. Estiri, M. Kahani, H. Ghaemi and M. Abasi, "Improvement of An Abstractive Summarization Evaluation Tool using Lexical-Semantic Relations and Weighted Syntax Tags in Farsi Language," in 12th Iranian Conference on Intelligent Systems Higher Education Complex of Bam, Bam, 2014.
[71] A. Jacob, A. Babu and P. C. R. Raj, "TnT tagger with fuzzy rule based learning," in Signal Processing, Informatics, Communication and Energy Systems (SPICES), Kozhikode, 2015.
[72] A. R. Martinez, Part-of-speech tagging, Wiley Interdisciplinary Reviews: Computational Statistics 4 (1) (2012) 107–113.
[73] A. R. Martinez, "Part-of-speech tagging," Wiley Periodicals, Inc., vol. 4, pp. 107-113, 2012.
[74] H. Yamane and M. Hagiwara, "Oxymoron generation using an association word corpus and a large-scale N-gram corpus," Soft Computing, vol. 19, pp. 919-927, 2015.
[75] J. Hoon Kim, J. Seo and G. Chang Kim, "Estimating Membership Functions in a Fuzzy Network Model for Part-Of-Speech Tagging," Journal of Intelligent & Fuzzy Systems: Applications in Engineering and, vol. 4, no. 4, pp. 309-320, 1996.
[76] K. Atanassov, "Intuitionistic fuzzy logics as tools for evaluation of Data Mining processes," 25th anniversary of Knowledge-Based Systems, vol. 80, pp. 122-130, 2015.
[77] A. Chitra and A. Rajkumar, "Paraphrase Extraction using fuzzy hierarchical clustering," Applied Soft Computing, vol. 34, p. 426–437, 2015.
[78] T. J. Ross, Properties of membership functions, fuzzification, and defuzzification, Fuzzy logic with engineering applications (2010) 89–116.
[79] K. Gilda, S. Satarkar, Analytical overview of defuzzification methods, International Journal of Ad- vance Research, Ideas and Innovations in Technology 6 (2) (2020) 359–365.
[80] P. M. LaCasse, W. Otieno, F. P. Maturana, A hierarchical, fuzzy inference approach to data filtration and feature prioritization in the connected manufacturing enterprise, Journal of Big Data 5 (2018) 1–31.
[81] L. Perumal, F. H. Nagi, Switching control system based on largest of maximum (lom) defuzzification- theory and application, Fuzzy Logic–Controls, Concepts, Theories andApplications, InTech, Rijeka (2012) 301–324.
[82] L. Perumal and F. H. Nagi, "Switching Control System Based on Largest of Maximum (LOM) Defuzzification – Theory and Application," in Fuzzy Logic – Controls, Concepts, Theories and Applications, Slavka Krautzeka, InTech, 2012, pp. 301-325.
[83] S. Naaz, A. Alam and R. Biswas, "Effect of different defuzzification methods in a fuzzy based load balancing application," IJCSI International Journal of Computer Science Issues, vol. 8, no. 5, pp. 261-267, 2011.
[84] H. Tzung-Pei, C. Chun-Hao, W. Yu-Lung and L. Yeong-Chyi, "A GA-based fuzzy mining approach to achieve a trade-off between number of rules and suitability of membership functions," Soft Computing, vol. 10, p. 1091–1101, 2006.
[85] C. D. Manning, "Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics?," in CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing., Tokyo, 2011.
Journal of Applied Dynamic Systems and Control,Vol.8., No.1., 2025: 44-60
| 45 |
Six Fuzzy Morphology Methods for Roles of Combines Standard Sentences Persian language
Haniye Rezaei1, Homayun Motameni2*, Behnam Barzegar3
1 Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran.Email:haniye.rezaei@gmail.com
2* Corresponding Author: Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran. Email: motameni@iausari.ac.ir
3 Department of Computer Engineering, Babol Branch, Islamic Azad University, Babol, Iran.Email:barzegar.behnam@yohoo.com
Received: 2024.11.18; Accepted: 2025.02.02
Abstract–Ability to remove ambiguities is one of the main characteristics of the fuzzy systems in resolving the problems like NLP and morphology. On the other hand, most of the conducted studies in the field of morphology of Farsi language are dealing with analysis of words by means of HMM statistical method. Therefore, this paper has conducted the statistical morphology in the role of words in a sentence in the two sets of independent roles like: “Subject, adverb, possess, possessive, subject, subject header, appositive, conjunct, conjunctive, exclamation and proclaimed” using Fuzzy system. Also, in this Fuzzy system, the Max (Product) Fuzzifier by Bi-gram labeling was used. In addition, regarding the importance of defuzzifier of step one, in all fuzzy systems, for the first time, 6 types of defuzzifiers of ‘Maximum membership, center of gravity, weighted average, mean of maximum, smallest of maximum, largest of maximum, center average’ were implemented and obtained results have shown that the Center of gravity defuzzifier method with the mean of 63.698% had better results in comparison with other defuzzifer methods.
Keywords: Fuzzy System, Bi-gram, difuzzifire, independent roles, dependent roles.
1. Introduction
As this paper explores the labeling of vocabulary words in Farsi in the case of role of words in sentences, and because of using the Bi-gram labeling method during the proposed approach, this research can be seen in the range of labeling of words via statistical methods.
Thus, according Section 2, most studies in Persian deals with the study of words, on the other hand, statistical method proposed in this paper and previous research of phase system, used fuzzy system for decision making, that it is important to know that research in the field of statistical morphology applied HMM methods or other methods other than the fuzzy method.
Another novelty in this paper is about investigation of 6 types of difuzzifire approach in Bi-gram labeling method which has the highest output in relation to the Uni-gram labeling method and com-binational methods in past researches.
Also, it should be noted that in these calculations and fuzzy decision making, in three directions (role, type and words) Bi-gram labeling was considered. One of them is the Bi-gram labeling from analysis of the words of input word (one of the system inputs), to the words combination. The second Bi-gram labeling calculated the weight of words caused by the words forming the sentences and the final Bi-gram labeling in these calculations, is related to the words combination. After fuzzification and creation of a fuzzy relation of Max (product) among the labels, finally comparison of 6 types of defuzzification methods in Bi-gram approach was done and results were compared [1, 2].
2. Related works
The studies on lexicology as a great part of data mining are started by Petr Trojanskij,1933 [3, 4]. Later, during these years, data mining were evaluated by different methods [5]. One of the methods is Rule Base [6], statistical and probability [7], memory-based, combination, etc. [8]. Most of the studies on processing of natural languages are in two statistical sets, Rule- Base or a combination of both methods with other method as: [9] in Indian language and [10, 11] in Arabic language. The majority of combination methods are rule based and statistical methods combination. Thus, rule-based and statistics-based methods are reviewed.
Rule-based method: This method includes grammar for making sentences and making words. One of the researchers on using Grammar in NLP in non-Persian-Arabic languages is Chomsky, 1956. He was a pioneer in this research. In the same year, Kleene performed the grouping and prioritization of words based on rules with finite automats and regular terms [12]. As continuing the researches [13], labeling of speech was performed with finite state and [14] with simple rules and by rule-based method. Research [15] has implemented labeling part of speech by rule-based method on Indian language. Backus [16] implemented semantics by rule-based method. Also, the study [17] refers to the models with the rules based on dependence at conceptual level of web pages [18]. The researches on machine translation in Catalan-Spanish can be based on the rules of this language. Regarding Persian-Arabic, by rule-based method, Khoja and Gar-side performed root search of words by using some models of words [19]. In another method by counting pre-fixes and suffixes and rules of Arabic language [20] searched the roots of words. Taghva et al., [21] performed roots searching of Arabic words by Khoja method with the difference that no dictionary was used. The researches of Ababneh et al., [22] are regarding root search of words in Arabic to improve the results of searching. Buckwalter in the study [23] implemented morphological analysis of Arabic. Researches [24, 25] applied rule-based method to improve root search and clarification of Arabic words and [26] performed researches on the system evaluating root search of Arabic by that Grammar and the relevant rules. Bateni [27] achieved the structure of different sentences by Persian grammar and then evaluated the conversion of each sentence to another type.
Probability and statistical method: The researchers who performed some studies on language processing in non-Persian-Arabic languages as: Kaplan performed an empirical study on statistical method [28] in the target language. Also, the study [29] is the tagging of random words of 5 languages automatically and the results are tested on 7 languages. Research [30] is the tagging of English words by a probable model. Researches [31, 32] are taggers of words in speech determination as implemented based on statistical model. In addition, researches [33] have applied statistical models and methods for machine translation with the difference that [34] is on machine translation of English and Russian. In addition, to find the morphological form of words and classification of terms, by statis-tical methods researches [35, 36] are performed. In Persian-Arabic languages researches as statistical based [37], N-gram tagger is used in Arabic to find the roots of words. IN the research [38], N-gram statistical method is used to classify Arabic documents. In some researches, N-gram statistical tag-ger and Hidden Markov Model statistical method [39, 40, 41] are used to tag Arabic speech [42]. The researches on Persian by statistical method like Arabic are performed to use HMM, N-gram as statistical morphological analyzer [43] and grammar tagger of Persian vocabularies [44]. These re-searches have presented a bright path in lexicology and processing of natural language namely in Persian. It is worth to mention that in lexicology of Persian, some researchers as “Bijan Khan” و “Shams Fard” have conducted effective researches [45, 46].
After 3 decades of the first research activities in lexicology and natural language processing, fuzzy theory was presented by Lotfi A. Zadeh” in a paper called “Fuzzy Sets” [47]. Later, in 1973, fuzzy control was established. This theory was used widely in many research fields and as an expert system could have a special position in most affairs including artificial intelligence (AI), smart systems, medicine, chemical industry [48], transportation industry, etc. One of the most important applications of this system is data mining and lexicology is a part of it and has received less attention by the re-searchers. The first research on speech determination and model diagnose by fuzzy system is dedicated to [49]. To achieve web browser models in data extraction by fuzzy system, we can refer to [50]. In fuzzy decision tree in data mining, we can consider the first studies regarding [51] and then to [52, 53]. In words tagger by fuzzy networks [54] and using intuitive fuzzy logic in data mining, we can refer to research [55].
Considering all advances in fuzzy data mining it seems that (as [56] and [57]), fuzzy morphology, especially in Persian and Arabic, and in particular the combination of words in the sentence, which deals with the meaning of words in sentences, have paid less attention by the researchers [58]. However, specific applications and the broad results of morphology systems including the "machine translation [34], summarization [59], filtering [50], speech recognition, speech synthesis [60, 61], indexing and combining texts," can be pointed which makes it essential to conduct studies for research on morphology and operation phase systems.
Therefore, in this paper, using a method [1] based on fuzzy system decisions are made in the role of words in sentences in Farsi. The study has four important aspects, one because this study examines the role of words in sentences, not of different words in the sentence, which means that the results of this study refer to the concept of sentence. Second, Bi-Gram labeling method was used in a fuzzy system that puts the method in the range of statistical methods. Third, the impact of any defuzzification method in recognition of role of words in sentences was studied. The fourth reason of im-portance of this study is that in each of examined defuzzificators, both dependent and independent roles as well as common and less common dependent roles were evaluated separately.
3. Method of determining the fuzzy role
The studied Fuzzy method considers different levels of membership for each word according to two factors of forming letters of the word, and the moment of transition from analysis to combination in the form of fuzzy values. In addition to taking the process of defuzzification, even the possibility of any role after another, is effective on the defuzzification. Therefore, discussed fuzzy computational methods using the demystify property of fuzzy systems, and taking into consideration all aspects affecting the words role [62, 63], is trying to resolve the problem and identify the role of words in sentences [64].
Therefore, Specific feature this method can be stated the following:
1. Possibility of training the expert system using the grammatical rules of grammar Persian Language, in analysis and composition.
2. Use of the word database for each role, in order to obtain the weight of each word due to the letters of the word.
3. Using statistical computing relating to the transition from analysis to combine words.
4. Using statistical computing relating to the presence of each of the other role.
5. Obtain independent and dependent separate roles.
6. Evaluating six different types of defuzzification.
7. Compare the results of the six different types of defuzzification.
8. Use fuzzification, the impact poor relations, less, and the impact strong relationships, further, be.
9. Possibility of using this method in other languages.
[65, 48, 1]
The method for determining the fuzzy role has 8 steps. Therefore, in general the algorithm of fuzzy calculations has 8 steps as shown below:
1. Receiving the sentences and analysis of them from user
2. Extracting the required matrices
3. Forming the possible states according to the number of words of input sentences.
4. Removing the impossible scenarios
5. Calculating the matrix of Realation_tarkib/ using the expressed terms of calculations in step 6.
6. Deriving different defuzzifires available in 4-3 section using the in 2-3-3 section and
of 3-1-3-3
7. Repeating steps 5 and 6 for all possible scenarios.
8. Obtaining the biggest output result of a variety of possible scenarios of phrasing and displaying the output results [66, 1].
As can be seen in Fig 1 the fuzzy system approach includes 4 overall steps. In this part, these 4 steps are described in detail to identify the role of words using fuzzy system. So initially there are descriptions for inputs required for the system, then the rules of grammar of Persian language required for training of fuzzy system for conclusion engine [67], and then steps of taking required conclusions by mean of fuzzy system are described. At the 4th step, the process of extracting results by means of different defuzzifires is descried and their steps are compared. Finally, a hypothetical example, and at the end of a hypothetical example with the four methods are described [8, 68, 69].
3.1. Input sentences and type of words
One of the inputs of this system is the sentences in the Farsi language. It is worth noting that these sentences and words are in standard form. In this study, the slangy sentences are not considered. Since some of the Persian words can be bi-partial, (like the ‘has-been’ verb, it I essential to consider the roles of accurate typing rules, like treatment of half-distance standard.
Another input of system is the true tag of analysis, related to each of the words that can take advantage from most of successful researches conducted in this field to identify the type of words in the sentence. To obtain the analysis of the words in the input sentences, it is possible to use an executable file "software of natural language processing" made in "laboratory networks of Ferdowsi University of Mashhad". This software represents only Tags of analysis of input sentences clearly, and is easily accessible in the site of this laboratory. Analysis tags discussed here are included of 10 tags: (verb, noun, adverb, adjective, pronoun, preposition, including overnight ;!;). [70].
3.2Persian grammar rules
The number of possible combinations or derived from Persian grammar for this project are 194 cases, with different wording. Therefore, the statistical results of 194 different wording, is used to train the expert system. In this regard, firstly the analysis of this type of 194 wording of Persian grammar and then their combination with the help of Persian grammar books in Persian grammar school and high school with the help of experts in the field, extraction and then to train the fuzzy system are applied [1].
3.3Conclusion engine
Persian language conclusion engine by means of fuzzy system, is composed of two sections of labeling and defuzzificating. Therefore, firstly three different types of labeling required for this method are as shown in Section 3.3.1, then using such labeling, Defuzzification is described by Max (product) method [8, 71].
3.3.1. Bi-gram labeling
One of the major successful labeling methods used and managed in statistical morphology, is the labeled N-gram labeling approach. This method is classified in different classes including Uni-gram as the first floor, Bi-gram as the second floor and Tri-gram as the third floor. In general, the formula for calculating the N-gram method is the following[6, 7].
| (1) |
Changes of sentence 1 | Type of labeling | No. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Uni-gram | 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Bi-gram | 2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Tri-gram | 3 |
| (2) |
| (3) |
| (4) |
No | Name of output matrix | Required input | Changes of statement 2 | Calculation of weights of roles | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 | Matrix_b | ‘باران’ /baran/ |
| % of repeating word ‘b’ after ‘a’+ % of repeating word ‘a’ after ‘r’+ % of repeating word ‘r’ after ‘a’+ % of repeating word ‘a’ after ‘n’ | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | Bi_gram_tarkib | ‘باران زود بارید.’/baran zud barid/ and ‘noun, adjective and verb’
|
| Percent repeat of role ‘adverb’ after the type of ‘noun’+ % of repeating the role of ‘verb’ after the ‘adjective’" | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | Bi_gram_kol | ‘باران زود بارید.’/baran zud barid/ |
| Percent repeat of role ‘adverb’ after the type of ‘Subject + % of repeating the role of ‘verb’ after the ‘adverb |
| (5) |
| (6) |
| (7) |
| (8) |
| (9) |
| (10) |
| (11) |
| (12) |
| (13) |
| (14) |
| ا | ب | پ | ت | ث | ج |
ا | 0.02 | 4.23 | 0.21 | 1.26 | 0.25 | 0.89 |
ب | 11.81 | 0.47 | 0.00 | 0.91 | 0.00 | 0.04 |
پ | 13.46 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
ت | 5.78 | 1.27 | 0.10 | 0.20 | 0.10 | 1.08 |
ث | 12.50 | 5.56 | 0.00 | 1.39 | 0.00 | 0.00 |
ج | 14.37 | 0.74 | 0.00 | 1.19 | 0.00 | 0.00 |
In table 4, the lonely part of Matrix_B array or the sum of the values for the hypothetical sentence of ‘باران زود بارید.’/baran zud barid/ are shown in the hypothetical roles of ‘Noun, Adjective, Verb’. These calculations are totally described in section 1-1-3-3.
Table 4. Matrix_B calculations for three roles of ‘Subject, adverb and verb for example
Input word | Role subject | % of presence of each word after another in each role title | Sum of weight of role |
‘باران’/Baran/ | Subject | "b" Then "a"+ "a" Then "r"+ "a" Then "n"=
| 0.132+ 0.098+ 0.0115+ 0.174=1.407 |
‘زود’/Zud/ | Adverb | "z" Then "u"+ +"u" Then "d"=
| 0.066 0.272=0.338 |
‘بارید’/barid/ | Verb | "b" Then "a"+ "a" Then "r"+ "r" Then "i"+ ="i" Then "d"+ | 0.118+ 0.13+ 0.097 0.095=0.44 |
Therefore, for the hypothetical roles of ‘Subject, adverb and verb’ values of Matrix_B of this sentence are ‘باران’/baran/ →subject=1.407, ‘زود’/zud/→adverb=0.338,’بارید’/barid/ →verb=0.44.
For example, table 5 shows some parts of Bi_gram_tarkib matrix for the independent roles. The point that after each word in the ith location which role is placed at the i+1th location is defined by Bi_gram_tarkib matrix.
Table 5. Some parts of Bi-gram-tarkib of independent roles
Complem_ ntarity (j) | Object (i) | Predicate (D) | Subject headers (C) | Subject (A) | Type
role |
5.556 | 7.407 | 0.000 | 0.000 | 3.704 | Verb |
8.261 | 12.174 | 5.652 | 0.435 | 8.261 | noun |
0.000 | 10.256 | 10.256 | 2.564 | 2.564 | adverb |
2.083 | 2.083 | 4.167 | 6.250 | 18.750 | adjective |
4.225 | 14.085 | 7.042 | 0.000 | 4.225 | pronoun |
36.416 | 6.936 | 2.312 | 0.578 | 21.387 | letter |
9.091 | 0.000 | 0.000 | 0.000 | 45.455 | clause |
5.882 | 23.529 | 29.412 | 0.000 | 29.412 | ؛ |
0.000 | 0.000 | 0.000 | 0.000 | 0.000 | ! |
0.000 | 0.000 | 0.000 | 0.000 | 0.000 | ، |
In Table 6 Results of calculations of statement 6 are shown in the input sentence of ‘باران زود بارید.’/baran zud barid/. Regarding the tables 5 and 6 and statements of 5 and 6 available in section 2-3-3, table 6 can be obtained.
Table 6. Values of Relation_tarkib matrix in the sentence of ‘باران زود بارید.’/baran zud barid/.
Type of words | verb | noun | adverb | adjective | pronoun | letter | Claus | ! | ؛ | ، |
‘باران’/Baran/ | 0.0269 | 0.29 | 0.41 | 0.458 | 0.442 | 0.56 | 0.545 | 0.117 | 1 | 0.117 |
‘زود’/Zud/ | 0.0269 | 0.29 | 0.41 | 0.458 | 0.442 | 0.56 | 0.545 | 0.117 | 1 | 0.117 |
‘بارید’/Barid/ | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
Table 7. Some parts of Bi_gram_kol matrix
Role | verb | adverb | Complementarity | object | predicate | Subject headers | subject |
subject | 0.201 | 0.04 | 0.04 | 0.114 | 0.0 | 0.0 | 0.148 |
Subject headers | 0 | 0 | 0.0 | 0.107 | 0.607 | 0.036 | 0.0 |
predicate | 0.6 | 0.033 | 0.0 | 0.0 | 0.133 | 0.0 | 0.0 |
object | 0.235 | 0.014 | 0.029 | 0.279 | 0.0 | 0.0 | 0.0 |
Complementarity | 0.576 | 0.01 | 0.196 | 0.043 | 0.22 | 0.0 | 0.0 |
adverb | 0.384 | 0.127 | 0 | 0.115 | 0 | 0.038 | 0.077 |
verb | 0 | 0 | 0 | 0.108 | 0 | 0 | 0.027 |
In table 6, joint roles like verb are jointed in analysis and combination and do not require calculations, therefore they have a value of Zero.
For example, table 7 shows some parts of Bi_gram_kol matrix among two matrices of Bi_gram_kol shown in section 3-1-3-3. Total Bi_gram_kol shows the presence percentage of each role after another at the ith and i+1th location stated by statistical calculations defined at section 3-1-3-3. Bi_gram_kol is considered as the in fuzzy calculations.
Table 8 shows types of calculations for defuzzifires for the hypothetical sentences of ‘باران زود بارید.’/baran zud barid/ by the type of words of ‘noun, adjective, verb’ and just one hypothetical condition of words ‘subject, adverb, verb’ regarding the statement of 14-8 and tables of 7 and 8.
Table 8. Calculations of gravity center for three hypothetical conditions
Calculation Sample | Title of defuzzifires | |||||
| Max membership | |||||
| Center of Gravity | |||||
| Largest of max | |||||
| Smallest of max | |||||
| Mean of max | |||||
| Weighted average |
| (15) |
| (16) |
| (17) |
Title of Defuzzifications | Mean of defuzzifications |
1.Max of Membership | 43.648 |
2.Center of Gravity | 63.698 |
3.Largest of Max | 58.007 |
4.Smallest of Max | 56.711 |
5.Mean of Max | 53.475 |
6.Weighted Average | 38.607 |
In table 9, the exact value of the mean of success rate in each of the defuzzification methods of section 4-3 is shown, both for the dependent and independent roles. Obviously, the maximum success rate in acquisition of roles of 70 input experimental sentences is for the defuzzifires of Center of Gravity, and after that for Largest of Max, Smallest of Max، Mean of max، Max of Membership, respectively. Finally, the minimum success was obtained by the Weighted Average method.
Fig 2. Comparison of the total success percentage of each one of the defuzzicators
Figure 2 shows the image of success of each one of the defuzzification methods in identification of words combination in Persian sentences. Obviously, there is a great difference among the Largest of Max and Weighted Average. But there is no difference among the Center of Gravity, Largest of Max and Smallest Max, Mean Of max, while the maximum difference among these 4 defuzzification methods is about 10%.
Table 10. Mean of success percentage of each one of the defuzzifires separated by sets of dependent and independent roles
Title of defuzzifier | Mean of independent roles | Mean of dependent roles |
1.Max of Membership | 56.026 | 31.270 |
2.Center of Gravity | 60.843 | 66.553 |
3.Largest of Max | 55.576 | 60.438 |
4.Smallest of Max | 54.148 | 59.275 |
5.Mean of Max | 52.288 | 54.663 |
6.Weighted Average | 32.284 | 44.930 |
Table 10 shows the % of success for the dependent and independent roles, separated for each of the defuzzifires. Obviously, in the independent roles, three better positions belonged to Center of Gravity، Max of Membership defuzzifire methods and with the little difference, is the Largest of Max. While for the dependent roles, the Max of Membership has the least percentage of success and three better positions are belonged to Center of Gravity، Largest of Max and Smallest of Max.
Fig 3. Comparison of the mean of success percentage for each one of the defuzzicators separated by the sets of dependent and independent roles
As shown in figure 3, difference of success percentage of dependent and independent roles in 4 types of defuzzifires are closed in two sets of dependent and independent roles. These roles with close success percentage of dependent and independent roles are included of: Center of Gravity، Largest of Max، Smallest of Max and Mean of Max. Just in two cases of Weighted average and Max of Membership the values of success percentage mean have more fluctuations. Also, in these two cases with high fluctuations in mean of success percentage of dependent and independent roles, the least mean of success percentage is include of the composition roles, too.
Table 11. Mean of success percentage of each one of the defuzzificators separated by the independent roles
Defuzzifires
Role | max of membership | center of gravity | mean max | largest of max | smallest of max | weighted average |
Subject | 55.814 | 6.977 | 41.860 | 46.512 | 51.163 | 81.395 |
Subject header | 59.459 | 75.676 | 56.757 | 56.757 | 54.054 | 18.919 |
predicate | 51.220 | 82.927 | 65.854 | 60.976 | 60.976 | 7.317 |
object | 63.636 | 63.636 | 63.636 | 63.636 | 54.545 | 45.455 |
Complementary | 50.000 | 75.000 | 33.333 | 50.000 | 50.000 | 8.333 |
Table 11 shows the amount of success for each one of the defuzzifires in each of the independent roles of Farsi language, separately. Obviously, the minimum and maximum values of success are observed in weighted average and center of gravity methods. On the other hands, there are severe fluctuations in success of identification of independent roles, center of gravity (the best method for identification of these roles) and weighted average (the worse method for identification of roles).
While the values of success with low fluctuations caused that these 4 methods become successful in identification of each one of the independent roles and also the relative success is between “50-60” as shown in table 11.
As shown in Figure 4, severe fluctuations of success in identification of each of the roles related to the two approaches of center of gravity and weighted average are really sensible. On the other hands, the mild slope of changes I the 4 remained methods are also tangible.
Fig 4. Comparison of percentage mean of success in each one of the defuzzicators separated by independent roles.
Obviously, center of gravity method has the biggest mean and is just usable for identification of role of the subject, with less efficiency than the other methods. Also, in Farsi language grammar, both roles of subject and subject header can be considered as the ‘subject’. Therefore, in this way the center of gravity method acts so better than other methods for identification of independent role of sentences.
Table 12. Mean of the success percentage in each of the defuzzification methods separated by dependent roles
Defuzzifires Role | max of membership | center of gravity | mean max | largest of max | smallest of max | weighted average |
Adjective | 10.000 | 55.000 | 10.000 | 70.000 | 10.000 | 35.000 |
Adverb | 7.407 | 92.593 | 92.593 | 92.593 | 92.593 | 70.370 |
Unknown | 3.846 | 23.077 | 50.000 | 55.769 | 51.923 | 76.923 |
Subject | 16.667 | 50.000 | 8.333 | 25.000 | 16.667 | 33.333 |
possessive | 11.765 | 41.176 | 29.412 | 17.647 | 29.412 | 17.647 |
possess | 14.286 | 28.571 | 14.286 | 7.143 | 21.429 | 14.286 |
appositive | 66.667 | 66.667 | 33.333 | 33.333 | 66.667 | 33.333 |
conjunct | 50.000 | 100.000 | 100.000 | 100.000 | 100.000 | 100.000 |
conjunctive | 50.000 | 100.000 | 100.000 | 100.000 | 100.000 | 75.000 |
exclamation | 50.000 | 75.000 | 75.000 | 75.000 | 75.000 | 0.000 |
proclaimed | 50.000 | 100.000 | 75.000 | 75.000 | 75.000 | 25.000 |
Table 12 shows the success percentage of each of the dependent roles in each one of the 6 defuzzifire methods in detail. It is to be noted that ‘unknown’ means the words without any dependent roles in the sentence. Dependent roles in reality are the roles where their existence in the sentence is depended on another role.
While investigation of the most successful dependent methods, the center of gravity defuzzifire method had success lower than 40% just in two cases of roles, while in other methods, values less than 40% were seen at least in 4 cases.
Of course, it should be noted that these two cases of minimum role in the center of gravity defuzzification method lay in the 6 first rows of the Table 13, which means a notable effect in the sentences of Farsi language.
In the case of investigation of the less successful methods for identification of dependent roles, it should be noted that the Max of Membership defuzzification method in 5 roles had a low effectiveness in the sentence of Farsi language, which means ‘appositive, conjunct, conjunctive,
For investigation of the most unsuccessful methods for identification of dependent role, it should be noted that the Max of Membership defuzzifire method had the success values of 50-66% in the 5 low-presence roles of appositive, conjunct, conjunctive, exclamation and proclaimed. It is happening while these roles have a little presence in the Farsi language sentences. Like the role of ‘alternative’ that because of its rare attendance in sentences of Farsi language and coverage of this role by the others, it is not considered in this study. But more importantly, in all significant 6 roles like ‘adjective, dependent adverb, unknown, subject, possessive and possess’ success percentage was lower than 15%.
Fig 5. Comparison of success percentages for each one of the defuzzifires separated by dependent roles.
As shown in Figure 5, mean of the roles of dependent roles of sentences in Farsi language in 6 first roles, which are more important in relation to the net 5 roles, the success belongs to center of gravity، largest of max، weighted average، smallest of max، mean max and finally the Max Of Membership, while in the case of latter 5 roles of table 12, the insignificant roles, figure 5 shows that successes of center of gravity، smallest of max، largest of max and mean max are equal and after them there is Max Of Membership and finally there is weighted average [1].
5. Conclusion and Suggestions
By investigation of fuzzy identification of role of words, in sentences of Farsi language, it can be found that from the defects of this method and other statistical based methods is the heavy load of calculations. On the other hand, the investigation method of this work has advantages like non-dependence to the vocabulary bank in fuzzy calculations and suitable accuracy in determination of role of words in Farsi language.
Regarding the mentioned issues, in an overall mean, the Center of Gravity is the best method of defuzzification for identification of role of words in the Farsi sentences, using the Bi-Gram labeling method among other 5 methods of defuzzification.
Of course, in distinct investigation of both sets of dependent and independent roles, the Center of Gravity method is superior. In this way the second and third ranks of independent roles belong to the Max of Membership and Largest of Max method. But in the frequently-used dependent roles, second and third ranks are belonged to the Largest of Max and Weighted of Average, and for the low-used dependent roles it is the Smallest of Max and jointly at the third rank there are the Mean of Max and Largest of Max methods.
Therefore, to conduct better researches, in the fuzzy identification of role of words in the sentence and also boosting the discussed method, mentioned points can be considered:
● Completing the statistical calculations and obtained matrices of grammar, for training of fuzzy system
● Studying the fuzzy identification system and role of words in sentences of other languages.
● Studying labeling impact and also N-Gram with degrees of Higher N in fuzzy identification of role of words in the sentences
● Boosting the fuzzy system educational grammar regarding the Farsi Grammar
● Combining the fuzzy based statistical method, identifying the role of words with other methods for identifying the type of words
● Studying the non-standard and slangy sentences
● Studying the role of overlapping and similar form words, specifically.
References
[1] H. Motameni and A. Peykar, "Morphology of Compounds as Standard Words in Persian through Hidden Markov Model and Fuzzy Method," Journal of Intelligent & Fuzzy Systems, vol. 30, no. 3, pp. 1567-1580, 2016.
[2] T. Chadza, K. G. Kyriakopoulos, S. Lambotharan, Analysis of hidden markov model learning algo- rithms for the detection and prediction of multi-stage network attacks, Future generation computer systems 108 (2020) 636–649.
[3] J. Wettig, S. Hiltunen and R. Yangarber, "Hidden Markov Models for induction of morphological structure of natural language," Department of Computer Science,University of Helsinki, Finland, Helsinki, 2010.
[4] S. Naderi Parizi, "Implementation of hidden Markov models associated with the ability to apply the language, grammar, search methods and the applicability of the model," Amirkabir University of Technology, Department of Computer Engineering and Information Technology, Tehran, 2007.
[5] C. C. Aggarwal, Data Mining, Switzerland: Springer International, 2015.
[6] M. Mohseni and B. Minaei-bidgoli, "A Persian Part-Of-Speech Tagger Based on Morphological Analysis," in Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta, 2010.
[7] W. Khan, A. Daud, K. Khan, J. A. Nasir, M. Basheri, N. Aljohani, F. S. Alotaibi, Part of speech tagging in urdu: Comparison of machine and deep learning approaches, IEEE Access 7 (2019) 38918–38936.
[8] P. Koehn, Statistical Machine Translation, New York: United States of America by Cambridge University Press, 2010, pp. 181-212.
[9] M. Shrivastava, "Hindi POS Tagger Using Naive Stemming : Harnessing Morphological Information Without Extensive Linguistic Knowledge," in ICON-2008:6th International Conference on Natural Language Processing,Macmillan Publishers., India, 2008.
[10] M. Bahrani, H. Sameti, N. Hafezi and S. Momtazi, "A New Word Clustering Method for Building N-Gram Language Models in Continuous Speech Recognition Systems," in New Frontiers in Applied Artificial Intelligence, 21st International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE, Wroclaw, Poland, 2008.
[11] A. F. Alajmi, E. M. Saad and M. H. Awadalla, "Hidden markov model based Arabic morphological analyzer," International Journal of Computer Engineering Research, vol. 2, no. 2, pp. 28-33, 2011.
[12] D. Dutta, S. Halder, T. Gayen, Intelligent part of speech tagger for hindi, Procedia Computer Science 218 (2023) 604–611
[13] T.-L. Tseng, F. Jiang, Y. Kwon, Hybrid type ii fuzzy system & data mining approach for surface finish, Journal of Computational Design and Engineering 2 (3) (2015) 137–147.
[14] A. Chiche, B. Yitagesu, Part of speech tagging: a systematic review of deep learning and machine learning approaches, Journal of Big Data 9 (1) (2022) 1–25.
[15] D. Modi and N. Nain, "Part-of-Speech Tagging of Hindi Corpus Using Rule-Based Method," Proceedings of the International Conference on Recent Cognizance in Wireless Communication & Image Processing, Vols. 10.1007/978-81-322-2638-3, no. 28, pp. 241-247, 2016.
[16] J. S. Rohl, "A note on Backus Naur form," Department of Computer Science, The University, Manchester 13, 2010.
[17] S. Poria, E. Cambria, G. Winterstein and G.-B. Huang, "Sentic patterns: Dependency based rules for concept-level sentiment analysis.," Knowledge-Based Systems., 2014.
[18] M. R. Costa-Juss`a, M. Farr´us, J. B. Mari˜no and J. A. Fonollosa, "Study and comparison of rule-based and statistical catalan-spanish machine translation systems," Computing and Informatics, vol. 31, pp. 245-270, 2012.
[19] A. Alnaied, M. Elbendak, A. Bulbul, An intelligent use of stemmer and morphology analysis for arabic information retrieval, Egyptian Informatics Journal 21 (4) (2020) 209–217.
[20] K. Darwish, "Building a shallow morphological analyzer in one day.," in ACL-02 Workshop on Computational Approaches to Semitic Languages, Philadelphia, PA, 2002.
[21] K. Taghva, R. Elkhoury and J. S Coombs, "Arabic stemming without a root dictionary.," in Information Technology: Coding and Computing.ITCC 2005, 2005.
[22] A. Mohamad, A.-S. Riyad and K. Ghass, "Building an Effective Rule-Based Light Stemmer for Arabic Language to Improve Search Effectiveness.," International Arab Journal of Information Technology (IAJIT), vol. 9, no. 4, pp. 368-372, 2012.
[23] T. Buckwalter, "Buckwalter Arabic Morphological Analyzer.," the Linguistic Data Consortium,, Pennsylvania, 2002.
[24] H. K. Al Ameed, S. O. Al Ketbi, A. A. Al Kaabi, K. S. Al Shebli, N. F. Al Shamsi, N. H. Al Nuaimi and S. S. Al Muhairi, "Arabic light stemmer: anew enhanced approach," in The Second International Conference on Innovations in Information Technology (IIT’05), 2005.
[25] L. Larkey, L. Ballesteros and M. Connel, "Improving Stemming for Arabic Information Retrieval: Light Stemming and Co-occurrence Analysis.," in 25th annual international ACM SIGIR conference on Research and development in information retrieval, 2002.
[26] A. El-Hajar, M. Hajar and K. Zreik, "A System for Evaluation of Arabic Root Extraction Methods.," in fifth international Conference on Internet and Web Applications and Services., 2010.
[27] H.Alshalabi,S.Tiun,N.Omar,F.N.AL-Aswadi,K.A.Alezabi,Arabiclight-basedstemmer usingnewrules,JournalofKingSaudUniversity-ComputerandInformationSciences34(9)(2022) 6635–6642
[28] M. F. Kabir, K. Abdullah-Al-Mamun, M. N. Huda, Deep learning based parts of speech tagger for bengali, in: 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV), IEEE, 2016, pp. 26–29.
[29] K. K. Zin, N. Thein, Hidden markov model with rule based approach for part of speech tagging of myanmar language, in: Proceedings of 3rd International Conference on Communications and Information, 2009, pp. 123–128.
[30] F. Pisceldo, M. Adriani, R. Manurung, Probabilistic part of speech tagging for bahasa indonesia, in: Third international MALINDO workshop, 2009, pp. 1–6.
[31] M. Attia, Y. Samih, A. Elkahky, H. Mubarak, A. Abdelali, K. Darwish, Pos tagging for improving code-switching identification in arabic, in: Proceedings of the Fourth Arabic Natural Language Processing Workshop, 2019, pp. 18–29.
[32] F. Pisceldo, M. Adriani, R. Manurung, Probabilistic part of speech tagging for bahasa indonesia, in: Third international MALINDO workshop, 2009, pp. 1–6.
[33] M. Febryanto, I. Sulyaningsih, A. A. Zhafirah, Analysis of translation techniques and quality of translated terms of mechanical engineering in accredited national journals, Professional Journal of English Education 1 (2021) 116–119.
[34] A. Xv, Russian-english bidirectional machine translation system, in: Proceedings of the Fifth Con- ference on Machine Translation, 2020, pp. 320–325.
[35] H. Aldarmaki, A. Ullah, S. Ram, N. Zaki, Unsupervised automatic speech recognition: A review, Speech Communication 139 (2022) 76–91.
[36] M. Creutz, "Unsupervised segmentation of words using prior distributions of morph length and frequency," in Proc. 41st Meeting of ACL, Sapporo,Japan, 2003.
[37] F. Ahmed and A. Nürnberger, "N-grams Conflation Approach for Arabic," in ACM SIGIR Conference, Amsterdam, 2007.
[38] A. Y. Muaad, G. H. Kumar, J. Hanumanthappa, J. B. Benifa, M. N. Mourya, C. Chola,
M. Pramodha, R. Bhairava, An effective approach for arabic document classification using machine learning, Global Transitions Proceedings 3 (1) (2022) 267–271.
[39] K. Tnaji, K. Bouzoubaa, S. L. Aouragh, A light arabic pos tagger using a hybrid approach, in: Digital Technologies and Applications: Proceedings of ICDTA 21, Fez, Morocco, Springer, 2021, pp. 199–208.
[40] M. El-Hadj, A.-S. IA and A.-A. AM, "Arabic Part of Speech Tagging Using the Sentence Structure.," in 2nd international Conference on Arabic Language Resources & Tools, Cairo, 2009.
[41] H. Hassani, Part of speech tagging (post) of a low-resource language using another language (devel- oping a pos-tagged lexicon for kurdish (sorani) using a tagged persian (farsi) corpus), CoRR (2022) abs/2201.12793.
[42] S. Alqrainy, M. Alawairdhi, Towards developing a comprehensive tag set for the arabic language, Journal of Intelligent Systems 30 (1) (2020) 287–296.
[43] H. Motameni, A. Ebrahimnejad, J. Vahidi, et al., Morphology of composition functions in persian sentences through a newly proposed classified fuzzy method and center of gravity defuzzification method, Journal of Intelligent & Fuzzy Systems 36 (6) (2019) 5463–5473.
[44] S. M. Assi and M. Haji Abdolhosseini, "Grammatical tagging of a Farsi Corpus.," International Journal of Corpus Linguistics., vol. 5, no. 1, pp. 69-81, 2000.
[45] M. Bijankhan, J. Sheykhzadegan, M. Bahrani and M. Ghayoomi, "Lessons from Building a Persian Written Corpus: Peykare," Language Resources and Evaluation, vol. 45, pp. 143-164, 2011.
[46] M. Shamsfard, H. Sadat Jafari and M. Ilbe, "STeP-1: A Set of Fundamental Tools for Persian Text Processing.," in LREC 2010, Valletta, Malt, 2010.
[47] H. T.-P., L. K.-Y. and W. S.-L., "Mining linguistic browsing patterns in the world wide web," Soft Computing, vol. 5, pp. 329-336, 2002.
[48] F. M. Zanzotto, L. Dell’Arciprete, A. Moschitti, Efficient graph kernels for textual entailment recog- nition, Fundamenta Informaticae 107 (2-3) (2011) 199–222.
[49] N. Passalis, J. Raitoharju, A. Tefas, M. Gabbouj, Efficient adaptive inference for deep convolutional neural networks using hierarchical early exits, Pattern Recognition 105 (2020) 107346.
[50] Z. Elaggoune, R. Maamri, I. Boussebough, A fuzzy agent approach for smart data extraction in big data environments, Journal of King Saud University-Computer and Information Sciences 32 (4) (2020) 465–478.
[51] M. E. Cintra, M. C. Monard, H. A. Camargo, A fuzzy decision tree algorithm based on c4. 5, Mathware & Soft Computing 20 (1) (2013) 56–62.
[52] X. Sun, L. Yuan, M. Liu, S. Liang, D. Li, L. Liu, Quantitative estimation for the impact of mining activities on vegetation phenology and identifying its controlling factors from sentinel-2 time series, International Journal of Applied Earth Observation and Geoinformation 111 (2022) 102814.
[53] X. Bai, Y. Yang, Fuzzy decision tree algorithm based on feature value’s class contribution level, Iranian Journal of Fuzzy Systems 19 (4) (2022) 73–88.
[54] S. Sayami, S. Shakya, Nepali pos tagging using deep learning approaches, NU. International Journal of Science 17 (2) (2020) 69–84.
[55] A. Krassimir, "Intuitionistic fuzzy logics as tools for evaluation of Data Mining processes," 25th anniversary of Knowledge-Based Systems, vol. 80, pp. 122-130, 2015.
[56] M. Moniri, "Fuzzy and Intuitionistic Fuzzy Turing Machines," Fundamenta Informaticae, vol. 123, no. 3, pp. 305-315, 2013.
[57] C. Rahul, T. Arathi, L. S. Panicker, R. Gopikakumari, Morphology & word sense disambiguation em- bedded multimodal neural machine translation system between sanskrit and malayalam, Biomedical Signal Processing and Control 85 (2023) 105051.
[58] A. Bria, W. Faber and N. Leone, "Normal Form Nested Programs," Fundamenta Informaticae, vol. 96, no. 3, pp. 271-295, 2009.
[59] R. Jayashree, S. K. Murthy, K. Sunny, Keyword extraction based summarization of categorized kannada text documents, International Journal on Soft Computing 2 (4) (2011) 81.
[60] C. Gupta, A. Jain, N. Joshi, Fuzzy logic in natural language processing–a closer view, Procedia computer science 132 (2018) 1375–1384.
[61] G. Chen, T. T. Pham, N. Boustany, Introduction to fuzzy sets, fuzzy logic, and fuzzy control systems, Applied Mechanics Reviews 54 (6) (2001) B102–B103.
[62] H. Englund, H. Stockhult, S. Du Rietz, A. Nilsson, G. Wennblom, Learning-environment uncertainty and students’ approaches to learning: A self-determination theory perspective, Scandinavian Journal of Educational Research (2022) 1–15.
[63] T. Chen, An innovative fuzzy and artificial neural network approach for forecasting yield under an uncertain learning environment, Journal of Ambient Intelligence and Humanized Computing 9 (2018) 1013–1025. .
[64] A. Chiche, B. Yitagesu, Part of speech tagging: a systematic review of deep learning and machine learning approaches, Journal of Big Data 9 (1) (2022) 1–25.
[65] C. Marsala, B. Bouchon-Meunier, Fuzzy data mining and management of interpretable and subjective information, Fuzzy Sets and Systems 281 (2015) 252–259.
[66] C. Marsala and B. Bouchon-Meunier, "Fuzzy data mining and management of interpretable and subjective information," Fuzzy Sets and Systems, vol. 281, no. Special Issue Celebrating the 50th Anniversary of Fuzzy Sets, p. 252–259, 2015.
[67] F. M. Zanzotto, L. Dell'Arciprete and A. Moschitti, "Efficient Graph Kernels for Textual Entailment Recognition," Fundamenta Informaticae, vol. Moschitti, no. 2-3, pp. 199-222, 2011.
[68] T.-L. Tseng, F. Jiang and Y. Kwon, "Hybrid Type II fuzzy system & datamining approach for surface finish," Journal of Computational Design and Engineering, vol. 2, no. 3, pp. 137-147, 2015.
[69] E. J. Khatib, R. Barco, A. Gómez-Andrades, P. Muñoz and I. Serrano, "Data mining for fuzzy diagnosis systems in LTE networks," Expert Systems with Applications, vol. 42, no. 21, p. 7549–7559, 2015.
[70] A. Estiri, M. Kahani, H. Ghaemi and M. Abasi, "Improvement of An Abstractive Summarization Evaluation Tool using Lexical-Semantic Relations and Weighted Syntax Tags in Farsi Language," in 12th Iranian Conference on Intelligent Systems Higher Education Complex of Bam, Bam, 2014.
[71] A. Jacob, A. Babu and P. C. R. Raj, "TnT tagger with fuzzy rule based learning," in Signal Processing, Informatics, Communication and Energy Systems (SPICES), Kozhikode, 2015.
[72] A. R. Martinez, Part-of-speech tagging, Wiley Interdisciplinary Reviews: Computational Statistics 4 (1) (2012) 107–113.
[73] A. R. Martinez, "Part-of-speech tagging," Wiley Periodicals, Inc., vol. 4, pp. 107-113, 2012.
[74] H. Yamane and M. Hagiwara, "Oxymoron generation using an association word corpus and a large-scale N-gram corpus," Soft Computing, vol. 19, pp. 919-927, 2015.
[75] J. Hoon Kim, J. Seo and G. Chang Kim, "Estimating Membership Functions in a Fuzzy Network Model for Part-Of-Speech Tagging," Journal of Intelligent & Fuzzy Systems: Applications in Engineering and, vol. 4, no. 4, pp. 309-320, 1996.
[76] K. Atanassov, "Intuitionistic fuzzy logics as tools for evaluation of Data Mining processes," 25th anniversary of Knowledge-Based Systems, vol. 80, pp. 122-130, 2015.
[77] A. Chitra and A. Rajkumar, "Paraphrase Extraction using fuzzy hierarchical clustering," Applied Soft Computing, vol. 34, p. 426–437, 2015.
[78] T. J. Ross, Properties of membership functions, fuzzification, and defuzzification, Fuzzy logic with engineering applications (2010) 89–116.
[79] K. Gilda, S. Satarkar, Analytical overview of defuzzification methods, International Journal of Ad- vance Research, Ideas and Innovations in Technology 6 (2) (2020) 359–365.
[80] P. M. LaCasse, W. Otieno, F. P. Maturana, A hierarchical, fuzzy inference approach to data filtration and feature prioritization in the connected manufacturing enterprise, Journal of Big Data 5 (2018) 1–31.
[81] L. Perumal, F. H. Nagi, Switching control system based on largest of maximum (lom) defuzzification- theory and application, Fuzzy Logic–Controls, Concepts, Theories andApplications, InTech, Rijeka (2012) 301–324.
[82] L. Perumal and F. H. Nagi, "Switching Control System Based on Largest of Maximum (LOM) Defuzzification – Theory and Application," in Fuzzy Logic – Controls, Concepts, Theories and Applications, Slavka Krautzeka, InTech, 2012, pp. 301-325.
[83] S. Naaz, A. Alam and R. Biswas, "Effect of different defuzzification methods in a fuzzy based load balancing application," IJCSI International Journal of Computer Science Issues, vol. 8, no. 5, pp. 261-267, 2011.
[84] H. Tzung-Pei, C. Chun-Hao, W. Yu-Lung and L. Yeong-Chyi, "A GA-based fuzzy mining approach to achieve a trade-off between number of rules and suitability of membership functions," Soft Computing, vol. 10, p. 1091–1101, 2006.
[85] C. D. Manning, "Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics?," in CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing., Tokyo, 2011.