A New Approach to Define the Number of Clusters for Partitional Clustering Algorithms
Subject Areas : Transactions on Fuzzy Sets and SystemsHuliane Silva 1 , Benjamın Ren Callejas Bedregal 2 , Anne Canuto 3 , Thiago Batista 4 , Ronildo Moura 5
1 - Departamento de Engenharias e Tecnologias, Universidade Federal Rural do Semi-rido, Pau dos Ferros, Brasil.
2 - Departamento de Inform´atica e Matem´atica Aplicada, Universidade Federal do Rio Grande do Norte, Natal, Brasil.
3 - Departamento de Inform´atica e Matem´atica Aplicada, Universidade Federal do Rio Grande do Norte, Natal, Brasil.
4 - Departamento de Inform´atica e Matem´atica Aplicada, Universidade Federal do Rio Grande do Norte, Natal, Brasil.
5 - Departamento de Inform´atica e Matem´atica Aplicada, Universidade Federal do Rio Grande do Norte, Natal, Brasil.
Keywords: Partitional clustering algorithms, Clustering fuzzy, Number of cluster,
Abstract :
Data clustering consists of grouping similar objects according to some characteristic. In the literature, there are several clustering algorithms, among which stands out the Fuzzy C-Means (FCM), one of the most discussed algorithms, being used in different applications. Although it is a simple and easy to manipulate clustering method, the FCM requires as its initial parameter the number of clusters. Usually, this information is unknown, beforehand and this becomes a relevant problem in the data cluster analysis process. In this context, this work proposes a new methodology to determine the number of clusters of partitional algorithms, using subsets of the original data in order to define the number of clusters. This new methodology, is intended to reduce the side effects of the cluster definition phase, possibly making the processing time faster and decreasing the computational cost. To evaluate the proposed methodology, different cluster validation indices will be used to evaluate the quality of the clusters obtained by the FCM algorithms and some of its variants, when applied to different databases. Through the empirical analysis, we can conclude that the results obtained in this article are promising, both from an experimental point of view and from a statistical point of view.
[1] Bedregal BR. A comparative study between fuzzy c-means and ckmeans algorithms. In: 2010 Annual Meeting of the North American Fuzzy Information Processing Society. 2010. p.1-6. DOI:https://www.doi.org/10.1109/NAFIPS.2010.5548194
[2] Bezdek JC. Pattern Recognition with Fuzzy Objective Function Algorithms. Norwell, MA, USA: Kluwer Academic Publishers; 1981.
[3] Bezdek JC, Ehrlich R, Full W. FCM: The fuzzy c-means clustering algorithm. Computers & geosciences. 1984; 10(2-3): 191-203. DOI: https://doi.org/0.1016/0098-3004(84)90020-7
[4] Bezdek JC, Pal NR. Some new indexes of cluster validity. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 1998; 28(3): 301-315. DOI: https://doi.org/10.1109/3477.678624
[5] Bong CW, Rajeswari M. Multi-objective nature-inspired clustering and classification techniques for image segmentation. Applied soft computing. 2011; 11(4): 3271-3282. DOI:
https://doi.org/10.1016/j.asoc.2011.01.014
[6] Carvalho A, Faceli K, Lorena A, Gama J. Inteligncia Artificialuma Abordagem de Aprendizado de Mquina. Rio de Janeiro: LTC; 2011.
[7] da Silva LR, Arnaldo HA, da Silva H, Moura R, Bedregal B, de P Canuto AM. Two deterministic selection methods for the initial centers in Fuzzy c-means based algorithms. Intelligent Data Analysis. 2020; 24(4): 779-798. DOI: https://doi.org/10.3233/IDA-194588
[8] Dave RN. Validating Fuzzy partitions obtained through c-shells clustering. Pattern recognition letters. 1996; 17(6): 613-623. DOI: https://doi.org/10.1016/0167-8655(96)00026-8
[9] Vargas RR, Bedregal BR, Palmeira ES. A comparison between k-means, fcm and ckmeans algorithms. In: 2011 Workshop-School on Theoretical Computer Science. 2011. p.32-38. DOI: https://doi.org/10.1109/WEIT.2011.28
[10] Vargas RR, Freddo R, Galafassi C, Gass SL, Russini A, Bedregal B. Identifying pixels classified uncertainties ckmeansimage algorithm. In: Information Processing and Management of Uncertainty in Knowledge-Based Systems. 2018. p.429-440. DOI:10.1007/978-3-319-91479-4 36
[11] Dinh DT, Fujinami T, Huynh VN. Estimating the optimal number of clusters in categorical data clustering by silhouette coeffcient. In: Knowledge and Systems Sciences. 2019. p.1-17. DOI: https://doi.org/10.1007/978-981-15-1209-4_1
[12] Dunn JC. A Fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics. 1973; 3(3): 32-57. DOI: https://doi.org/10.1080/01969727308546046
[13] Fukuyama Y, Sugeno M. A new method of choosing the number of clusters for the Fuzzy c-mean method. The 5th Fuzzy Syst Symp. 1989; 247: 247-250.
[14] Hair JF, Black WC, Babin BJ, Anderson RE, Tatham RL. Anlise Multivariada de Dados. Porto Alegre: Bookman; 2009.
[15] Halkidi M, Batistakis Y, Vazirgiannis M. On clustering validation techniques. Journal of intelligent information systems. 2001; 17(2-3): 107-145. DOI: https://doi.org/10.1023/A:1012801612483
[16] Havens TC, Bezdek JC, Palaniswami M. Cluster validity for kernel Fuzzy clustering. In: 2012 IEEE International Conference on Fuzzy Systems. 2012. p.1-8. DOI: https://doi.org/10.1109/FUZZ-IEEE.2012.6250820
[17] Jain AK, Dubes RC. Algorithms for Clustering Data. Prentice-Hall; 1988.
[18] Jain AK, Murty MN, Flynn PJ. Data clustering: A review.ACM computing surveys (CSUR). 1999; 31(3): 264-323. DOI: https://doi.org/10.1145/331499.331504
[19] Kodinariya TM, Makwana PR. Review on determining number of cluster in K-means clustering. International Journal. 2013; 1(6): 90-95. DOI:10.18576/amis/100428
[20] Liao HY, Ng MK. Categorical data clustering with automatic selection of cluster number. Fuzzy Information and Engineering. 2009; 1(1): 5-25. DOI: https://doi.org/10.1007/s12543-009-0001-5
[21] Markelle K, Rachel L, Kolby N. The UCI Machine Learning Repository, https://archive.ics.uci.edu
[22] Peres SM, Rocha T, Bscaro HH, Madeo RC, Boscarioli C. Tutorial sobre Fuzzy-c-means e Fuzzy learning vector Quantization: Abordagens hbridas para tarefas de agrupamento e classificao. Revista de Informtica Terica e Aplicada. 2012; 19(1): 120-163. DOI: https://doi.org/10.22456/2175-2745.13764
[23] Ren M, Liu P, Wang Z, Yi J. A self-adaptive Fuzzy c-means algorithm for determining the optimal number of clusters. Computational intelligence and neuroscience. 2016; 2016(3): 1-12. DOI: https://doi.org/10.1155/2016/2647389
[24] Sentelle C, Hong SL, Anagnostopoulos GC, Georgiopoulos M. A Fuzzy GAP statistic for Fuzzy c-means. In: International Conference Artificial Intelligence and Soft Computing. 2007.
[25] Silva HM, Canuto AM, Medeiros IG, Xavier-Jnior JC. Cluster ensembles optimization using coral reefs optimization algorithm. In: Artificial Neural Networks and Machine LearningICANN 2016: 25th International Conference on Artificial Neural Networks. 2016; p.275-282. DOI: https://doi.org/10.1007/978-3-319-44781-0 33
[26] Silva L, Moura R, Canuto AM, Santiago RH, Bedregal B. An interval-based framework for uzzy clustering applications. IEEE Transactions on Fuzzy Systems. 2015; 23(6): 2174-2187. DOI: https://doi.org/10.1109/TFUZZ.2015.2407901
[27] Smith JW, Everhart JE, DicksonWC, KnowlerWC, Johannes RS. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the annual symposium on computer application in medical care. 1988. p.261.
[28] Sun H, Wang S, Jiang Q. FCM-based model selection algorithms for determining the number of clusters. Pattern recognition. 2004; 37(10): 2027-2037. DOI: https://doi.org/10.1016/j.patcog.2004.03.012
[29] Tsai DM, Lin CC. Fuzzy C-means based clustering for linearly and nonlinearly separable data. Pattern recognition. 2011; 44(8): 1750-1760. DOI: https://doi.org/10.1016/j.patcog.2011.02.009
[30] nl R, Xanthopoulos P. Estimating the number of clusters in a dataset via consensus clustering. Expert Systems with Applications. 2019; 125: 33-39. DOI: https://doi.org/10.1016/j.eswa.2019.01.074
[31] Wang W, Zhang Y. On Fuzzy cluster validity indices. Fuzzy sets and systems. 2007; 158(19): 2095-2117. DOI: https://doi.org/10.1016/j.fss.2007.03.004
[32] Xing R, Li C. Fuzzy c-means algorithm automatically determining optimal number of clusters. Computers, Materials and Continua. 2019; 60(2): 767-780. DOI: https://doi.org/10.32604/cmc.2019.04500
[33] Xu S, Hu L, Yang X, Liu X. A cluster number adaptive Fuzzy c-means algorithm for image segmentation. International Journal of Signal Processing, Image Processing and Pattern Recognition. 2013; 6(5): 191-204. DOI: http://dx.doi.org/10.14257/ijsip.2013.6.5.17
[34] Yejun X. Optimization of the clusters number of an improved fuzzy C-means clustering algorithm. In: 2015 10th International Conference on Computer Science & Education (ICCSE). 2015. p.931-935. DOI: 10.1109/ICCSE.2015.7250383