Data clustering consists of grouping similar objects according to some characteristic. In the literature, there are several clustering algorithms, among which stands out the Fuzzy C-Means (FCM), one of the most discussed algorithms, being used in different ap
أکثر
Data clustering consists of grouping similar objects according to some characteristic. In the literature, there are several clustering algorithms, among which stands out the Fuzzy C-Means (FCM), one of the most discussed algorithms, being used in different applications. Although it is a simple and easy to manipulate clustering method, the FCM requires as its initial parameter the number of clusters. Usually, this information is unknown, beforehand and this becomes a relevant problem in the data cluster analysis process. In this context, this work proposes a new methodology to determine the number of clusters of partitional algorithms, using subsets of the original data in order to define the number of clusters. This new methodology, is intended to reduce the side effects of the cluster definition phase, possibly making the processing time faster and decreasing the computational cost. To evaluate the proposed methodology, different cluster validation indices will be used to evaluate the quality of the clusters obtained by the FCM algorithms and some of its variants, when applied to different databases. Through the empirical analysis, we can conclude that the results obtained in this article are promising, both from an experimental point of view and from a statistical point of view.
تفاصيل المقالة