Subject Areas : Computer Engineering
Mirmorsal Madani 1 , Homayun Motameni 2 , Hosein Mohamadi 3
1 - Department of Computer Engineering, Sari Branch, Islamic Azad university, Sari, Iran
2 - Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran
3 - Department of Computer Engineering, Azadshahr Branch, Islamic Azad University, Azadshahr, Iran
Keywords:
Abstract :
[1] Desuky A.S, Hussain S (2021) an Improved Hybrid Approach for Handling Class Imbalance Problem. Arab J SciEng 46, 3853–3864(2021). https://doi.org/10.1007/s13369-021-05347-7
[2] ChenY, Conory N, Rubin.V (2015) News in an Online World: The Need for an Automatic Crap Detector ASIST '15: Proceedings of the 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community November 2015 Article No.: 81 Pages 1–4
[3]Shrestha, A., Spezzano, F. Characterizing and predicting fake news spreaders in social networks. Int J Data Sci Anal (2021). https://doi.org/10.1007/s41060-021-00291-z
[4] Zhang X, Ghorbani AA (2019) An overview of online fake news: Characterization, detection, and discussion, Information Processing & Management, Volume 57, Issue 2,2020,102025,ISSN:0306 4573,https://doi.org/10.1016/j.ipm.2019.03.004 (https://www.sciencedirect.com/science/article/pii/S0306457318306794)
[5] Figueira Á, Oliveira L (2017) the current state of fake news: challenges and opportunities. Procedia Computer Science, Volume 121, 2017, Pages 817-825, ISSN 1877-0509, https: //doi.org/10.1016/j.procs.2017.11.106. (https://www.sciencedirect.com/science/article/pii/S1877050917323086)
[6] Fenglian Li, Xueying Zhang, Xiqian Zhang, Chunlei Du, Yue Xu, Yu-Chu Tian (2018) Cost-sensitive and hybrid-attribute measure multi-decision tree over imbalanced data sets, Information Sciences, Volume 422, 2018, Pages 242-256, ISSN 0020-0255, https://doi.org/10.1016/j.ins.2017.09.013. (https://www.sciencedirect.com/science/article/pii/S0020025517304784)
[7] Zhou X, Jain A, Phoha VV, Zafarani R (2019) Fake News Early Detection: A Theory-driven Model. arXiv preprint arXiv: 1904.11679
[8] McIntire G (2018) Fake and Real News Dataset. [Online], Available: https://github.com/GeorgeMcIntire/fake_real_news dataset, July 10, 2018
[9] Wang WY (2017) Liar, Liar Pants on Fire: A New Benchmark Dataset for Fake News Detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (p. 422426)
[10] Kaliyar R.K, Goswami A, Narang P (2021) DeepFakE: improving fake news detection using tensor decomposition-based deep neural network. J Supercomputing 77, 1015–1037. https://doi.org/10.1007/s11227-020-03294-y
[11] Shu K, Mahudeswaran D, Wang SH, Lee D, Liu H (2018) FakeNewsNet: A Data Repository with News Content, Social Context and Spatial temporal Information for Studying Fake News on Social Media [Online], Available: https://arxiv.org/abs/1809.01286, December 15, 2018
[12] Stefanowski J. (2016) Dealing with Data Difficulty Factors While Learning from Imbalanced Data. In: Matwin S., Mielniczuk J. (eds) Challenges in Computational Statistics and Data Mining. Studies in Computational Intelligence, vol 605. Springer, Cham. https://doi.org/10.1007/978-3-319-18781-5_17
[13] Michał K, Potential (2021) Anchoring for imbalanced data classification, Pattern Recognition, Volume 120, 2021, 108114, ISSN 0031-3203, https://doi.org/10.1016/j.patcog.2021.108114.
[14] Chawla N.V, Bowyer K. W, Hall L. O, Kegelmeyer W. P (2002) SMOTE: synthetic minority over-sampling technique, Journal of artificial intelligence research 16 (2002) 321–357.
[15] Maria P, Pedro Antonio G, Peter T, Cesar H (2016) Oversampling the minority class in the feature space, IEEE Trans. Neural Netw. Learning Syst. 27 (9) 1947–1961.
[16] Bellinger, C, Drummond, C, Japkowicz, N (2018). Manifold-based synthetic oversampling with manifold conformance estimation. Mach Learn 107, 605–637.https://doi.org/10.1007/s10994-017-5670-4
[17] Bunkhumpornpat C., Sinapiromsaran K., Lursinsap C. (2009) Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem. In: Theeramunkong T., Kijsirikul B., Cercone N., Ho TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science, vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_43
[18] He, Haibo & Bai, Yang, Garcia, Edwardo, Li, Shutao. (2008). ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. Proceedings of the International Joint Conference on Neural Networks. 1322 - 1328. 10.1109/IJCNN.2008.4633969.
[19] Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In: Huang DS, Zhang XP, Huang GB. (eds) Advances in Intelligent Computing. ICIC 2005. Lecture Notes in Computer Science, vol 3644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11538059_91
[20] Maciejewski, Tomasz, Stefanowski, Jerzy. (2011). Local neighbourhood extension of SMOTE for mining imbalanced data. Proceeding of the IEEE symposium on computational intelligence and data mining. 104-111. 10.1109/CIDM.2011.5949434.
[21] Wilson D.L (1972) Asymptotic properties of nearest neighbor rules using edited data IEEE Trans. Syst. Man. Cybern., 2 (3) (1972), pp. 408-421
[22] Two Modifications of CNN," in IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-6, no. 11, pp. 769-772, Nov. 1976, doi: 10.1109/TSMC.1976.4309452.
[23] Hart P (2006) The condensed nearest neighbor rule (corresp.). IEEE Trans. Inf. Theor., 14(3):515{516,
[24] Interject M, Zhang (2003) knn approach to unbalanced data distributions: a case study involving information extraction. In Proceedings of workshop on learning from imbalanced datasets, 2003.
[25] Drasko F, Srdjan S, Slobodan J, Silvana P, Misko S, Distance based resampling of imbalanced classes: With an application example of speech quality assessment, Engineering Applications of Artificial Intelligence, Volume 64, 2017, Pages 440-461, ISSN 0952-1976, https://doi.org/10.1016/j.engappai.2017.07.001.
[26] Peng M, Zhang Q, Xing X, Gui T, Huang X, Jiang Y.-G, Ding K., Chen Z (2019). Trainable Undersampling for Class-Imbalance Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 4707-4714. https://doi.org/10.1609/aaai.v33i01.33014707
[27] Lin W, Chih-Fong T, Ya-Han H, Jing-Shang J (2017) Clustering-based undersampling in class-imbalanced data.” Inf. Sci. 409 (2017): 17-26.
[28] Show-Jane Y, Yue-Shi L (2009) Cluster-based under-sampling approaches for imbalanced data distributions, Expert Systems with Applications, Volume 36, Issue 3, Part 1, 2009, Pages 5718-5727, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2008.06.108. (https://www.sciencedirect.com/science/article/pii/S0957417408003527)
[29] Ahmed H, Traore I, Saad S (2018) Detecting opinion spams and fake news using text classification”, Journal of Security and Privacy, Volume 1, Issue 1, Wiley, January/February 2018.
[30] Batista, Gustavo & Prati, Ronaldo & Monard, Maria-Carolina. (2004). A Study of the Behavior of Several Methods for Balancing machine Learning Training Data. SIGKDD Explorations. 6. 20-29. 10.1145/1007730.1007735.
[31] Koziarski, Michał, Wożniak, Michał (2017) CCR: A combined cleaning and resampling algorithm for imbalanced data classification" International Journal of Applied Mathematics and Computer Science, vol.27, no.4, 2017, pp.727-736. https://doi.org/10.1515/amcs-2017-0050
[32] Michał K, Michał W, Bartosz K (2020) Combined Cleaning and Resampling algorithm for multi-class imbalanced data with label noise, Knowledge-Based Systems, Volume 204, 2020, 106223, ISSN 0950-7051, https://doi.org/10.1016/j.knosys.2020.106223. (https://www.sciencedirect.com/science/article/pii/S0950705120304330)
[33] Bunkhumpornpat C, Sinapiromsaran K (2015). CORE: Core-based synthetic minority over-sampling and borderline majority under-sampling technique, International Journal of Data Mining and Bioinformatics 12(1): 44–58.
[34] Mathew, Josey, Pang, Chee & Luo, Ming, Leong, Weng. (2017). Classification of Imbalanced Data by Oversampling in Kernel Space of Support Vector Machines. IEEE Transactions on Neural Networks and Learning Systems. PP. 1-12. 10.1109/TNNLS.2017.2751612.
[35] Khan SH, Hayat M, Bennamoun M, Sohel FA, Togneri R (2017) Cost-Sensitive Learning of Deep Feature Representations from Imbalanced Data. IEEE Trans Neural Netw Learn Syst. 2018 Aug; 29(8):3573-3587. doi: 10.1109/TNNLS.2017.2732482. Epub 2017 Aug 17. PMID: 28829320.
[36] Reddy H et al (2020) Text-mining-based Fake News Detection Using Ensemble Methods", International Journal of Automation and Computing, DOI: 10.1007/s11633-019-1216-5 (H. Reddy, 2020)
[37] Goldani MH, Momtazi S, Safabakhsh R (2021) Detecting fake news with capsule neural networks. Applied Soft Computing, Volume 101, 106991, ISSN 1568 4946, https://doi.org/10.1016/j.asoc.2020.106991. (https://www.sciencedirect.com/science/article/pii/S1568494620309303)
[38] Iftikhar A, Muhammad Y, Suhail Y, Muhammad OA (2020) Fake News Detection Using Machine Learning Ensemble Methods. Complexity, vol. 2020, Article ID 8885861, 11 pages. https://doi.org/10.1155/2020/8885861
[39] Kaggle (2018) Fake News Detection. Kaggle, San Francisco, CA, USA, https://www.kaggle.com/jruvika/fake-news-detection
[40] Nasir JA, Khan OS, Varlamis I (2020) Fake news detection: A hybrid CNN-RNN based deep learning approach. Elsevier, International Journal of Information Management Data Insights, https://doi.org/10.1016/j.jjimei.2020.100007
[41] Goseva K et al (2020) Identification of Security related Bug Reports via Text Mining using Supervised and Unsupervised Classification, https://ntrs.nasa.gov/search.jsp?R=20180004739 2020-02 02T17:46:02+00:00Z
[42] Yukari O, Ichiro K (2013) Text Classification based on the Latent Topics of Important Sentences extracted by the PageRank Algorithm”, Proceedings of the ACL Student Research Workshop, pages 46–51, Sofia, Bulgaria, August 4-9 2013. Association for Computational Linguistics
[43] Ahmed H, Traore I, Saad S. (2017) “Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques. In: Traore I., Woungang I., Awad A. (eds) Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments. ISDDC 2017. Lecture Notes in Computer Science, vol 10618. Springer, Cham (pp. 127-138).
[44] Horne B.D, Adali S (2017) This just in: fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In: the 2nd International Workshop on News and Public Opinion at ICWSM
[45] Aldwairi M, Alwahedi A (2018) Detecting Fake News in Social Media Networks” ScienceDirect, Procedia Computer Science 141 (2018) 215- 222
[46] Waikhom L, Goswami, RS (2019) Fake News Detection Using Machine Learning. Proceedings of International Conference on Advancements in Computing & Management (ICACM) Available at SSRN: https://ssrn.com/abstract=3462938 or http://dx.doi.org/10.2139/ssrn.3462938 les. In Proceedings of the Eighth International Joint Conference on Natural Language Processing Short Papers pp. 252{256)
[47] Padurariu C, Breaban M (2019) Dealing with Data Imbalance in Text Classification. Procedia Computer Science. 159. 736-745. 10.1016/j.procs.2019.09.229
[48] Bagui S, Li K (2021) Resampling imbalanced data for network intrusion detection datasets. J Big Data 8, 6 (2021). https://doi.org/10.1186/s40537-020-00390-x
[49] Liping C, Jiabao J, Yong Z (2021), HSDP: A Hybrid Sampling Method for Imbalanced Big Data Based on Data Partition, Complexity, vol. 2021, Article ID 6877284, 9 pages, 2021. https://doi.org/10.1155/2021/6877284
[50] Li J, Wu Y, Fong S et al (2021) a binary PSO-based ensemble under-sampling model for rebalancing imbalanced training data. J Supercomputing... https://doi.org/10.1007/s11227-021-04177-6
[51] Vishwa K, Wenhao Z, Arash N, Ramin R (2019), GenSample: A Genetic Algorithm for Oversampling in Imbalanced Datasets, arXiv,abs/1910.10806
[52] Gu Xiaowei, Angelov P, Soares E (2019) A Self-Adaptive Synthetic Over-Sampling Technique for Imbalanced Classification
[53] Hu S.G, Liang Y.F, Ma L.T, He Y (2009) MSMOTE: Improving Classification Performance When Training Data is Imbalanced. In Proceedings of the 2009 Second International Workshop on Computer Science and Engineering, WCSE ’09, Washington, DC, USA, 28–30 October 2009; Volume 2, pp. 13–17.
[54] Sáez J.A, Krawczyk B, Wo ´zniak M (2016) Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets. Pattern Recognit. 2016, 57, 164–178
[55] Zellers, Rowan H, Ari R, Hannah B, Yonatan F, Ali R, Franziska C, Yejin. (2019). Defending Against Neural Fake News.
[56] Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 2012, 42, 463–484.
[57] Fernández A, García, S, Herrera F (2011) Addressing the Classification with Imbalanced Data: Open Problems and New Challenges on Class Distribution. In Hybrid Artificial Intelligent Systems: Proceedings of the HAIS 2011 6th International Conference, Wroclaw, Poland, 23–25 May 2011; Corchado, E.; Kurzy ´nski, M., Wo ´zniak, M., Eds.; Springer: Berlin/Heidelberg, Germmany, 2011; Part I; pp. 1–10.
[58] Pattaramon V, Eyad E (2019)Neighbourhood-based undersampling approach for handling imbalanced and overlapped data, Information Sciences, Volume 509, 2020, Pages 47-70, ISSN 0020-0255, https://doi.org/10.1016/j.ins.2019.08.062. (https://www.sciencedirect.com/science/article/pii/S0020025519308114)
[59] Batista, Gustavo & Bazzan, Ana & Monard, Maria-Carolina. (2003). Balancing Training Data for Automated Annotation of Keywords: a Case Study.the Proc. Of Workshop on Bioinformatics. 10-18.
[60] El-Shafeiy E, Abohany A (2020) Medical imbalanced data classification based on random forests. In: Joint European-US Workshop on Applications of Invariance in Computer Vision (pp. 81–91). Springer, Cham
[61] i J, Kim H (2020) G-mean based extreme learning machie for imbalance learning. Dig. Signal Process. 98, 10267 (2020)
[62] Dongdong L, Ziqiu C, Bolu W, Zhe W, Hai Y, Wenli D (2021) Entropy-based hybrid sampling ensemble learning for imbalanced data. Int J IntelSyst. 2021; 36: 3039– 3067. https://doi.org/10.1002/int.22388
[63] Babu M. Pushpa S (2020). Genetic Algorithm-Based PCA Classification for Imbalanced Dataset. 10.1007/978-981-15-2780-7_59
[64] Susan S, Amitesh (2020). Hybrid of Intelligent Minority Oversampling and PSO-Based Intelligent Majority Undersampling for Learning from Imbalanced Datasets. 10.1007/978-3-030-16660-1_74
[65] Kusner M, Hernández-Lobato, J (2016). GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution
[66] Jang E, Gu S, Poole B (2017) Categorical reparameterization with Gumbel-Soft- max, in: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings
[67] YounusKhan J et al (2021) A benchmark study of machine learning models for online fake news detection.Elsevier, Machine Learning with Applications Journal, https://doi.org/10.1016/j.mlwa.2021.100032
[68] Reis JCS, Correia A, Murai F, Veloso A, Benevenuto F (2019) Supervised Learning for Fake News Detection. in IEEE Intelligent Systems, vol. 34, no. 2, pp. 76-81, March-April 2019, doi: 10.1109/MIS.2019.2899143
[69] Spearman C (1987) The proof and measurement of association between two things, Am. J. Psychol. 15 (1904) 72–101
[70] Nandwani P, Verma R (2021) A review on sentiment analysis and emotion detection from text. Soc. Netw. Anal. Min. 11, 81.https://doi.org/10.1007/s13278-021-00776-6
[71] Baptista, João, Gradim, Anabela (2020) Understanding Fake News Consumption: A Review. Social Sciences. 9. 10.3390/socsci9100185.
[72] Baccianella S, Esali A, Sebastiani F (2010) SentiWordNet 3.0, An enhanced Lexical resource for sentiment analysis and opinion mining in:7th international conference on language resources and evaluation (LREC), pp 200-2204
[73] Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with sub word information, Transactions of the association for computational linguistics, vol.5, pp.135-146, 2017, Distributed under a CC-BY 4.0 license
[74] Le Q, Mikolov T (2014) Distributed Representations of Sentences and Documents. Proceedings of the 31 st International Conference on Machine Learning, Beijing, China, 2014. JMLR: W&CP volume 32. Copyright 2014 by the author(s)
[75] Chetana V, Kolisetty Soma S, Amogh K (2020). A Short Survey of Dimensionality Reduction Techniques. 10.1201/9781003043980-2.
[76] Tian L, Wang Z, Liu W et al (2021) An improved generative adversarial network with modified loss function for crack detection in electromagnetic nondestructive testing. Complex Intell. Syst. https://doi.org/10.1007/s40747-021-00477-9
[77] Sepp H, Jurgen S (1997) Long short-term memory. Neural computation”, 9(8):1735–1780
[78] Yang P, Paul D.Y, Juanita F, Bing B. Z, Zili Z, Albert Y. Z (2014) Sample subset optimization techniques for imbalanced and ensemble learning problems in bioinformatics applications." IEEE transactions on cybernetics44, no. 3: 445-455
[79] Radford A, Metz L, and Chintala S, “Addressing the Classification with Imbalanced Data with deep convolutional generative adversarial networks,” arXiv preprint arXiv: 1511.06434, 2015.
[80] Ayush J, Wael A, Yue W, Premkumar N, “Capsulegan: Generative adversarial capsule network,”in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 0–0.
[81] Ge H, Xia Y, Chen X, Berry R, Wu Y (2018) Fictitious GAN: Training GANs with Historical Models. In: Ferrari V., Hebert M., Sminchisescu C., Weiss Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, vol 11205. Springer, Cham. https://doi.org/10.1007/978-3-030-01246-5_8
[82] Iqbal, T., Qureshi, S., The Survey: Text Generation Models in Deep Learning., Journal of King Saud University Computer and Information Sciences (2020), doi: https://doi.org/10.1016/j.jksuci.
[83] Napierala K., Stefanowski J (2016) Types of minority class examples and their influence on learning classifiers from imbalanced data. J Intell Inf Syst 46, 563–597. https://doi.org/10.1007/s10844-015-0368-1
[84] Vallada E, Ruiz R (2011). A genetic algorithm for the unrelated parallel machine scheduling problem with sequence dependent setup times. European Journal of Operational Research. 211. 612-622. 10.1016/j.ejor.2011.01.011.
[85] Ferri C, Hernández-Orallo J, Modroiu R (2009) An experimental comparison of performance measures for classification, Pattern Recognition Letters, Volume 30, Issue 1, 2009, Pages 27-38, ISSN 0167-8655, https://doi.org/10.1016/j.patrec.2008.08.010
[86] Haibo H, Yunqian M (2013). Imbalanced Learning: Foundations, Algorithms, and Applications 10.1002/9781118646106.
[87] Davide C, Giuseppe J (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 21. 10.1186/s12864-019-6413-7.
[88] García V, Mollineda R.A, Sánchez J.S (2009) Index of Balanced Accuracy: A Performance Measure for Skewed Class Distributions. In: Araujo H., Mendonça A.M., Pinho A.J., Torres M.I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2009. Lecture Notes in Computer Science, vol 5524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02172-5_57
[89] Branco P, Torgo L, Ribeiro R (2015) A survey of predictive modelling under imbalanced distributions. ACM Comput Surv (CSUR). https://doi.org/10.1145/2907070
[90] Andrew P. B (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recognition, Volume 30, Issue 7, 1997, Pages 1145-1159, ISSN 0031-3203,
[91] Ting K.M (2011) Confusion Matrix. In: Sammut C., Webb G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_157