‎A New Approach to Define the Number of Clusters for Partitional Clustering Algorithms

Subject Areas : Transactions on Fuzzy Sets and Systems

Huliane Silva ¹ , Benjamın Ren Callejas Bedregal ² , Anne Canuto ³ , Thiago Batista ⁴ , Ronildo Moura ⁵

1 - Departamento de Engenharias e Tecnologias, Universidade Federal Rural do Semi-rido, Pau dos Ferros, Brasil.
2 - Departamento de Inform´atica e Matem´atica Aplicada, Universidade Federal do Rio Grande do Norte, Natal, Brasil.
3 - Departamento de Inform´atica e Matem´atica Aplicada, Universidade Federal do Rio Grande do Norte, Natal, Brasil.
4 - Departamento de Inform´atica e Matem´atica Aplicada, Universidade Federal do Rio Grande do Norte, Natal, Brasil.
5 - Departamento de Inform´atica e Matem´atica Aplicada, Universidade Federal do Rio Grande do Norte, Natal, Brasil.

Received: 2023-07-02 Accepted : 2023-09-03 Published : 2024-05-07

Keywords: Partitional clustering algorithms‎, ‎Clustering fuzzy‎, ‎Number of cluster‎,

Abstract :

‎Data clustering consists of grouping similar objects according to some characteristic‎. ‎In the literature‎, ‎there are several clustering algorithms‎, ‎among which stands out the Fuzzy C-Means (FCM)‎, ‎one of the most discussed algorithms‎, ‎being used in different applications‎. ‎Although it is a simple and easy to manipulate clustering method‎, ‎the FCM requires as its initial parameter the number of clusters‎. ‎Usually‎, ‎this information is unknown‎, ‎beforehand and this becomes a relevant problem in the data cluster analysis process‎. ‎In this context‎, ‎this work proposes a new methodology to determine the number of clusters of partitional algorithms‎, ‎using subsets of the original data in order to define the number of clusters‎. ‎This new methodology‎, ‎is intended to reduce the side effects of the cluster definition phase‎, ‎possibly making the processing time faster and decreasing the computational cost‎. ‎To evaluate the proposed methodology‎, ‎different cluster validation indices will be used to evaluate the quality of the clusters obtained by the FCM algorithms and some of its variants‎, ‎when applied to different databases‎. ‎Through the empirical analysis‎, ‎we can conclude that the results obtained in this article are promising‎, ‎both from an experimental point of view and from a statistical point of view‎.

References:

[1] Carvalho A, Faceli K, Lorena A, Gama J. Inteligência Artificial–uma Abordagem de Aprendizado de Máquina. Rio de Janeiro: LTC; 2011.
[2] Hair JF, Black WC, Babin BJ, Anderson RE, Tatham RL. Análise Multivariada de Dados. Porto Alegre: Bookman; 2009.
[3] Bong CW, Rajeswari M. Multi-objective nature-inspired clustering and classification techniques for image segmentation. Applied soft computing. 2011; 11(4): 3271-3282. DOI: https://doi.org/10.1016/j.asoc.2011.01.014
[4] Sun H, Wang S, Jiang Q. FCM-based model selection algorithms for determining the number of clusters. Pattern recognition. 2004; 37(10): 2027-2037. DOI: https://doi.org/10.1016/j.patcog.2004.03.012
[5] Sentelle C, Hong SL, Anagnostopoulos GC, Georgiopoulos M. A Fuzzy GAP statistic for Fuzzy c-means. In: International Conference Artificial Intelligence and Soft Computing. 2007.
[6] Kodinariya TM, Makwana PR. Review on determining number of cluster in K-means clustering. International Journal. 2013; 1(6): 90-95. DOI:10.18576/amis/100428
[7] Xu S, Hu L, Yang X, Liu X. A cluster number adaptive Fuzzy c-means algorithm for image segmentation. International Journal of Signal Processing, Image Processing and Pattern Recognition. 2013; 6(5): 191- 204. DOI: http://dx.doi.org/10.14257/ijsip.2013.6.5.17
[8] Ren M, Liu P, Wang Z, Yi J. A self-adaptive Fuzzy c-means algorithm for determining the optimal number of clusters. Computational intelligence and neuroscience. 2016; 2016(3): 1-12. DOI: https://doi.org/10.1155/2016/2647389
[9] Ünlü R, Xanthopoulos P. Estimating the number of clusters in a dataset via consensus clustering. Expert Systems with Applications. 2019; 125: 33-39. DOI: https://doi.org/10.1016/j.eswa.2019.01.074
[10] Jain AK, Murty MN, Flynn PJ. Data clustering: A review.ACM computing surveys (CSUR). 1999; 31(3): 264-323. DOI: https://doi.org/10.1145/331499.331504
[11] Wang W, Zhang Y. On Fuzzy cluster validity indices. Fuzzy sets and systems. 2007; 158(19): 2095-2117. DOI: https://doi.org/10.1016/j.fss.2007.03.004
[12] Xing R, Li C. Fuzzy c-means algorithm automatically determining optimal number of clusters. Computers, Materials and Continua. 2019; 60(2): 767-780. DOI: https://doi.org/10.32604/cmc.2019.04500
[13] Yejun X. Optimization of the clusters number of an improved fuzzy C-means clustering algorithm. In: 2015 10th International Conference on Computer Science & Education (ICCSE). 2015. p.931-935. DOI: 10.1109/ICCSE.2015.7250383
[14] Dinh DT, Fujinami T, Huynh VN. Estimating the optimal number of clusters in categorical data clustering by silhouette coefficient. In: Knowledge and Systems Sciences. 2019. p.1-17. DOI: https://doi.org/10.1007/978-981-15-1209-4_1
[15] Liao HY, Ng MK. Categorical data clustering with automatic selection of cluster number. Fuzzy Information and Engineering. 2009; 1(1): 5-25. DOI: https://doi.org/10.1007/s12543-009-0001-5
[16] Havens TC, Bezdek JC, Palaniswami M. Cluster validity for kernel Fuzzy clustering. In: 2012 IEEE International Conference on Fuzzy Systems. 2012. p.1-8. DOI: https://doi.org/10.1109/FUZZIEEE.2012.6250820
[17] Markelle K, Rachel L, Kolby N. The UCI Machine Learning Repository, https://archive.ics.uci.edu
[18] Smith JW, Everhart JE, Dickson WC, Knowler WC, Johannes RS. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the annual symposium on computer application in medical care. 1988. p.261.
[19] Bezdek JC. Pattern Recognition with Fuzzy Objective Function Algorithms. Norwell, MA, USA: Kluwer Academic Publishers; 1981.
[20] Dunn JC. A Fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics. 1973; 3(3): 32-57. DOI: https://doi.org/10.1080/01969727308546046
[21] Bedregal BR. A comparative study between fuzzy c-means and ckmeans algorithms. In: 2010 Annual Meeting of the North American Fuzzy Information Processing Society. 2010. p.1-6. DOI: https://www.doi.org/10.1109/NAFIPS.2010.5548194
[22] Tsai DM, Lin CC. Fuzzy C-means based clustering for linearly and nonlinearly separable data. Pattern recognition. 2011; 44(8): 1750-1760. DOI: https://doi.org/10.1016/j.patcog.2011.02.009
[23] Bezdek JC, Ehrlich R, Full W. FCM: The fuzzy c-means clustering algorithm. Computers & geosciences. 1984; 10(2-3): 191-203. DOI: https://doi.org/10.1016/0098-3004(84)90020-7
[24] Vargas RR, Bedregal BR, Palmeira ES. A comparison between k-means, fcm and ckmeans algorithms. In: 2011 Workshop-School on Theoretical Computer Science. 2011. p.32-38. DOI: https://doi.org/10.1109/WEIT.2011.28
[25] Vargas RR, Freddo R, Galafassi C, Gass SL, Russini A, Bedregal B. Identifying pixels classified uncertainties ckmeansimage algorithm. In: Information Processing and Management of Uncertainty in Knowledge-Based Systems. 2018. p.429-440. DOI:10.1007/978-3-319-91479-4_36
[26] Jain AK, Dubes RC. Algorithms for Clustering Data. Prentice-Hall; 1988.
[27] Halkidi M, Batistakis Y, Vazirgiannis M. On clustering validation techniques. Journal of intelligent information systems. 2001; 17(2-3): 107-145. DOI: https://doi.org/10.1023/A:1012801612483
[28] Peres SM, Rocha T, Bíscaro HH, Madeo RC, Boscarioli C. Tutorial sobre Fuzzy-c-means e Fuzzy learning vector Quantization: Abordagens híbridas para tarefas de agrupamento e classificação. Revista de Informática Teórica e Aplicada. 2012; 19(1): 120-163. DOI: https://doi.org/10.22456/2175-2745.13764
[29] Dave RN. Validating Fuzzy partitions obtained through c-shells clustering. Pattern recognition letters. 1996; 17(6): 613-623. DOI: https://doi.org/10.1016/0167-8655(96)00026-8
[30] Fukuyama Y, Sugeno M. A new method of choosing the number of clusters for the Fuzzy c-mean method. The 5th Fuzzy Syst Symp. 1989; 247: 247-250.
[31] Bezdek JC, Pal NR. Some new indexes of cluster validity. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 1998; 28(3): 301-315. DOI: https://doi.org/10.1109/3477.678624
[32] da Silva LR, Arnaldo HA, da Silva H, Moura R, Bedregal B, de P Canuto AM. Two deterministic selection methods for the initial centers in Fuzzy c-means based algorithms. Intelligent Data Analysis. 2020; 24(4): 779-798. DOI: https://doi.org/10.3233/IDA-194588
[33] Silva L, Moura R, Canuto AM, Santiago RH, Bedregal B. An interval-based framework for ffuzzy clustering applications. IEEE Transactions on Fuzzy Systems. 2015; 23(6): 2174-2187. DOI: https://doi.org/10.1109/TFUZZ.2015.2407901
[34] Silva HM, Canuto AM, Medeiros IG, Xavier-Júnior JC. Cluster ensembles optimization using coral reefs optimization algorithm. In: Artificial Neural Networks and Machine Learning–ICANN 2016: 25th International Conference on Artificial Neural Networks. 2016; p.275-282. DOI: https://doi.org/10.1007/978- 3-319-44781-0_33

Share To

Article Url

‎A New Approach to Define the Number of Clusters for Partitional Clustering Algorithms

Sanad

Links

Related Centers

Technical Support

Official pages