Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA
Subject Areas : Data MiningMozhgan Rahimirad 1 , Mohammad Mosleh 2 , Amir Masoud Rahmani 3
1 - Ahvaz Branch, Islamic Azad University, ahvaz, Iran
2 - dezfool Branch, Islamic Azad University, ahvaz, Iran
3 - Department of Computer Engineering, Science and Research Branch, Islamic Azad University
Keywords: learning automata, Text mining, Feature Selection, Classification, PSO algorithm,
Abstract :
With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However, only a few methods are utilized for huge text classification problems. In this paper, we propose a new wrapper method based on Particle Swarm Optimization (PSO) algorithm and Support Vector Machine (SVM). We combine it with Learning Automata in order to make it more efficient. This helps to select better features using the reward and penalty system of automata. To evaluate the efficiency of the proposed method, we compare it with a method which selects features based on Genetic Algorithm over the Reuters-21578 dataset. The simulation results show that our proposed algorithm works more efficiently.
[1] Guyon, I., Elisseeff, A, “An introduction to variable andfeature selection”, Journal of Machine Learning Research,Vol. 3, pp. 1157-1182. 2003.
[2] Jensen, R, “Combining rough and fuzzy sets for feature selection,” PhD Thesise, University of Edinburgh, UK. 2005.
[3] Jiana Meng, Hongfei Lin , Yuhai Yu , “A two-stage feature selection method for text categorization,” Computers and Mathematics with Applications 62 pp: 2793–2800. 2011.
[4] Chuntao Jiang, Frans Coenen, Robert Sanderson, Michele Zito, “Text classification using graph mining-based feature extraction,” Knowledge-Based Systems 23 pp: 302–308. 2010.
[5] Abdelwadood Moh’d Mesleh, “Feature sub-set selection metrics for Arabic text classification,” Pattern Recognition Letters 32 pp: 1922–1929. 2011.
[6] M.E. ElAlami, “A filter model for feature subset selection based on genetic algorithm,” Knowledge-Based Systems 22 pp: 356–362. 2009.
[7] Yishi Zhang, ShujuanLi, TengWang, ZigangZhang, “Divergence-based feature selection for separate classes,” Neurocomputing 101 pp:32–42. 2013.
[8] Harun Ug˘uz, “A two-stage feature selection method for text categorization by using information gain,” principal component analysis and genetic algorithm , Knowledge-Based Systems 24 pp: 1024–1032. 2011.
[9] Mehdi Hosseinzadeh Aghdam, Nasser Ghasem-Aghaee, Mohammad Ehsan Basiri, “Text feature selection using ant colony optimization,” Expert Systems with Applications 36 pp:6843–6853. 2009.
[10] S _erafettin Tas _cı, TungaGüngör, “Comparison of text feature selection policies and using an adaptive framework,” Expert Systems with Applications 40 pp: 4871–4886. 2013.
[11] Sebastiani, F,“Machine Learning in Automated TextCategorization", ACM Computing Surveys, Vol. 34, No.1, pp. 107-131. 2002.
[12] Wei, Z., Miao, D., Hugues, J., Zhao, R., Li, W., “N-grams based feature selection and text representation for Chinese Text Classification,” International Journal of Computational Intelligence Systems, 2 (4), pp. 365-374. 2009.
[13] Lan, M., Tan, C. L., “Supervised and Traditional Term Weighting Methods for Automatic Text Categorization,” Journal of IEEE Pami, 10 (10), pp. 1-36. 2007.
[14] M.E. ElAlami,. “A filter model for feature subset selection based on genetic algorithm,” Knowledge-Based Systems 22 pp: 356–362. 2009.
[15] Khan, A., Baharudin, B., Lee, L. H., Khan, K.,. “ A Review of Machine Learning Algorithms for Text-Documents Classification,” Journal of Advances in Information Technology, 1(1), pp. 4-20. 2010.