A New Model-based Bald Eagle Search Algorithm with Sine Cosine Algorithm for Data Clustering

Soleimanian Gharehchopogh, Farhad; Rostampnah, Berivan

Manuscript ID : JACET-2111-1486 (R1) Visit : 329 Page: 177 - 186

Article Type: Original Research

A New Model-based Bald Eagle Search Algorithm with Sine Cosine Algorithm for Data Clustering

Subject Areas : Data Mining

Farhad Soleimanian Gharehchopogh ^{1
*} , Berivan Rostampnah ²

1 - Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, IRAN
2 - Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, Iran

Received: 2021-11-27 Accepted : 2022-03-09 Published : 2021-08-01

Keywords: Bald Eagle Search Algorithm, Clustering, K-means, Sine-Cosine Algorithm,

Abstract :

Abstract— Clustering is one of the most popular techniques in unsupervised learning in which data is divided into different groups without any prior knowledge, and for this reason, clustering is used in various applications today. One of the most popular algorithms in the field of clustering is the k-means clustering algorithm. The most critical weakness of k-means clustering is that it is sensitive to initial values for parameterization and may stop at local minima. Despite its many advantages, such as high speed and ease of implementation due to its dependence on the initial parameters, this algorithm is in the optimal local configuration and does not always produce the optimal answer for clustering. Therefore, this paper proposes a new model using the Bald Eagle Search (BES) Algorithm with the Sine Cosine Algorithm (SCA) for clustering. The evaluation of the proposed model is based on the number of iterations, convergence, number of generations, and execution time on 8 UCI datasets. The proposed model is compared with Flower Pollination Algorithm (FPA), Crow Search Algorithm (CSA), Particle Swarm Optimization (PSO), and Sine-Cosine Algorithm (SCA). The results show that the proposed model has a better fit compared to other algorithms. According to the analysis, it can be claimed that the proposed model is about 10.26% superior to other algorithms and also has an extraordinary advantage over k-means.

References:

Full-Text:

Preparation of Papers for IJCET

Journal of Advances in Computer Engineering and Technology

A New Model-based Bald Eagle Search Algorithm with Sine Cosine Algorithm for Data Clustering

Received (Day Month Year)

Revised (Day Month Year)

Accepted (Day Month Year)

Abstract— Clustering is one of the most popular techniques in unsupervised learning in which data is divided into different groups without any prior knowledge, and for this reason, clustering is used in various applications today. One of the most popular algorithms in the field of clustering is the k-means clustering algorithm. The most critical weakness of k-means clustering is that it is sensitive to initial values for parameterization and may stop at local minima. Despite its many advantages, such as high speed and ease of implementation due to its dependence on the initial parameters, this algorithm is in the optimal local configuration and does not always produce the optimal answer for clustering. Therefore, this paper proposes a new model using the Bald Eagle Search (BES) Algorithm with the Sine Cosine Algorithm (SCA) for clustering. SCA is used to find the centers of the clusters and improve the centrality of the clusters obtained by the BES algorithm. Primary vectors are created based on the population of eagles, and then each vector is used SCA to search centers of the clusters. The evaluation of the proposed model is based on the number of iterations, convergence, number of generations and execution time on 8 UCI datasets. The proposed model is compared with Flower Pollination Algorithm (FPA), Crow Search Algorithm (CSA), Particle Swarm Optimization (PSO), and Sine-Cosine Algorithm (SCA). The results show that the proposed model has a better fit compared to other algorithms. According to the analysis, it can be claimed that the proposed model is about 10.26% superior to other algorithms and also has an extraordinary advantage over k-means.

Index Terms— Clustering, Bald Eagle Search Algorithm, Sine-Cosine Algorithm, K-means

I. INTRODUCTION

urrently, there is an exponential growth in the production and storage of data, and on the other hand, this data contains valuable hidden information that is very useful for the decision-making process about tasks [1]. Knowledge extraction into data is a challenge in the field of analysis and clustering. Clustering is one of the most significant fields of machine learning algorithms that have attracted most researchers and experts from a practical and theoretical point of view [2]. Clustering can be defined as classification without unsupervised allocating samples to different clusters [3, 4]. The primary purpose of data clustering is to group objects into similar and interdependent clusters [5].

The primary purpose of data clustering is that all samples, namely N, are considered as k-clusters. Each cluster contains samples that are similar in measurement, and different clusters are different from each other. Clustering analysis is an important technique used in data exploration, pattern recognition, machine learning, and various engineering issues. The essential clustering algorithms are partition-based clustering algorithm, hierarchical clustering, density-based clustering, grid-based clustering, and model-based clustering. Using the criterion defined as Euclidean distance in partition-based clustering algorithms, the samples are separated into k-clusters [6].

This paper uses the combination of the BES [7] and SCA [8] to cluster the data. The BES is one of the meta-heuristic algorithms developed to solve optimization problems [9-11]. Meta-heuristic algorithms in different fields of optimization have been able to perform well [12, 13]. The primary purpose of the proposed model is to cluster the data based on optimal and accurate center settings. The BES is a nature-based meta-heuristic algorithm that BES has been developed from hunting or intelligent social behavior of bald eagles when searching for fish. Hunting is divided into three stages by the BES. In the space selection, the eagle selects a space with the maximum number of baits. In the second stage (space search), the eagle moves inside the selected space to search for prey. In the third stage (swooping stage), the eagle swings from the best position specified in the second stage and determines the best point for hunting. The swing starts at the best point, and all other movements are directed to this point.

Various criteria, such as exploitation, exploration, convergence rate, and local optimums, are proposed in meta-heuristic algorithms to determine their strengths in solving problems. Hybridization is one of these ways, in which algorithms compensate for one other's flaws. As a result, the combination of SCA and BES has been employed in this research. In addition, meta-heuristic algorithms contain a variety of motion equations that allow them to do exploration and exploitation searches. In most cases, the movements are chosen at random in terms of certain parameters, which is inefficient. One of the greatest obstacles in these methods is that algorithms favor the profitable equation for updated solutions. By considering a vector for the agents’ movements, the SCA improves the efficiency of the updating process in BES. The SCA has a number of benefits over other population-based algorithms, including the ability to execute a robust search when dealing with large optimization problems. This algorithm combines the capabilities of local search and population-based optimization approaches into a single algorithm.

Among clustering techniques, the K-means algorithm is known as the most popular clustering algorithm [14]. This algorithm starts with k random centers and assigns each instance to the center that is closest to it. The centers in each cluster are recalculated in several iterations. Although the k-means algorithm is fast and straightforward, it has some initial problems for better clustering, including high dependence on the initial value, sensitivity to noise, and the creation of unbalanced clusters. In addition, in the process of minimizing the objective function, it may converge to local minima. Over the past two decades, meta-heuristic and evolutionary algorithms have been widely used to solve the clustering problem.

In the proposed model, the BES produces an optimal clustering based on minimizing the distance between the members of the cluster and the center of the clusters. Optimal point search is done using SCA. The main contributions of this paper are as follows:

· Find the centrality of the samples using the improved BES.

· Evaluation of the proposed model on 8 UCI datasets

· Comparison of the proposed model with meta-heuristic algorithms based on error rate and fitness function.

The general structure of this paper is organized as follows: Section 2 will review previous studies on data clustering. In Section 3; The proposed model is described using a combination of BES-SCA for data clustering. In Section 4; The evaluation of the proposed model is done on different datasets. Finally, the conclusion and future works are presented in Section 5.

II. Related Works

In this section, done several works about data clustering are reviewed. A hybrid model for data clustering is proposed based on the WOA and Tabu Search (TS) [15]. Twelve different datasets were used to evaluate the performance. The hybrid model has used an objective function to maintain the quality of the clustering solutions. Regardless of the size of the dataset, the hybrid model has been able to find high-quality centers in a few iterations, which has proven its ability to cover the problem space effectively. The hybrid model had a lower error rate than the PSO and the Imperialist Competitive Algorithm (ICA).

The Gray Wolf Optimization (GWO) and the PSO are two popular algorithms for optimizing swarm intelligence. In [16], a hybrid model is proposed based on the unique search mechanisms and the advantages of the two algorithms. The hybrid model was able to incorporate the benefits of both techniques, overcome their flaws, and improve clustering performance. The results showed that the error rate of the hybrid model was lower compared to k-means and the convergence of the hybrid model was more optimal compared to PSO and GWO.

A clustering method based on the CSA and opposition-based learning (OBL) is proposed [17]. The CSA is one of the meta-heuristic algorithms that have problems in the exploration and exploitation stage and is therefore sensitive to the initialization for the centrality of the clusters in the clustering problem. In the hybrid model, crows change their position based on the OBL method. The position of the crows is improved by using OBL to find the best center position for the clusters. To evaluate the performance of the proposed method, experiments were performed on eight datasets from the UCI repository and compared with seven different clustering algorithms. The results showed that the proposed method is more accurate, efficient and robust than other clustering algorithms. Also, the convergence of the proposed method is better compared to other algorithms.

An improved Bat Algorithm (BA) based on the WOA is used for data clustering [18]. The proposed method focuses on improving the efficiency of the BA. In the proposed method, instead of the random selection step, one answer is selected from the best answers and replaces some dimensions of the position vector in the bat algorithm. Some of the best solutions have been modified to reduce the siege mechanism and update the spiral with the WOA. Six datasets were used to examine the performance of the suggested technique in contrast to meta-heuristic algorithms in the subject of data clustering. These experiments show that the proposed method performed significantly better than the standard BA and performed better than the WOA. In general, the proposed method has been able to work more robust and better than the Harmony Search Algorithm (HAS), Artificial Bee Colony (ABC), WOA and BA.

A new clustering algorithm inspired by the Magnetic Force Optimization (MFO) algorithm [19] is proposed. This algorithm is not sensitive to the initialization of cluster centers—the position of the central particles changes according to the total magnetic force applied to the data points. Particle position updating is done by using magnetic force to find the best center particle position for the clusters. To evaluate the performance of the proposed model, statistical tests were performed on eleven datasets from the UCI repository, and five clustering algorithms were used for comparison. The results showed that the proposed model is more accurate, efficient, and robust than other clustering algorithms.

An improved Krill Algorithm (KA) is used to create a global search capability [20]. The improvement is based on adding a global search operator to explore the optimal search area to move krill particles towards the best global solution. An elitist strategy is also applied to maintain the best krill in the krill upgrade process. The proposed method is tested on twenty-six mathematical functions and six clustering datasets. The proposed method has a high convergence rate. The results showed that the value of the objective function on the Iris dataset was above 96%.

A new model is proposed based on the K-means algorithm and the Rice Optimization Algorithm (ROA) for data clustering [21]. A new hybrid algorithm based on the K-means algorithm is proposed to find cluster centers and avoid local optimization quickly. Experimental results showed that the combined clustering algorithm performed better than other similar algorithms.

A Multi-Objective Artificial Immune Algorithm (MOAIA) is used for fuzzy clustering based on multiple kernels [22]. The MOAIA improves the classical fuzzy clustering algorithm and overcomes some of its significant limitations, such as vulnerability in local convergence. Using the MOAIA and achieving an optimal solution set, the Pareto method is used. Initially, the degradation and premature convergence problems are prevented through initial antibody population, clone proliferation, and non-uniform mutation. In the end, the best solution is selected from the set of Pareto optimal solutions. Evaluation of the UCI dataset showed error reduction and optimal clustering.

An automated fuzzy clustering method based on the new version of the ABC algorithm (VABC) is proposed [23]. VABC allows the number of variable clusters. This method eliminates the need for a VABC-FCM-based Fuzzy C-Means (FCM) clustering technique with a predetermined number of clusters. In addition, VABC-FCM has a robust global search capability under logical parameterization. Some synthetic and natural datasets have been used to validate VABC-FCM performance. Experimental results showed that VABC-FCM can automatically evolve the optimal number of clusters and find a suitable fuzzy segmentation using the logical validity index for the data set. Finally, the performance of VABC-FCM is compared with that of Fuzzy Genetic Algorithm (VGA-FCM) clustering, PSO-FCM, and Differential Evolution (DE). The results showed that VABC-FCM performed better than VGA-FCM, PSO-FCM and DE-FCM in most cases.

The fuzzy technique has been widely used in data clustering. FCM advantages such as balancing the number of cluster points, moving cluster centers to optimal centers, and the presence of a fuzzy factor make it even more popular. However, the main limitation of FCM is the initial entrapment in the local minimums and the high sensitivity to setting up the cluster center. A new optimization approach based on a learning-teaching algorithm with an FCM clustering algorithm has been proposed to obtain appropriate values for cluster centers [24]. The simulation results of the proposed method are compared with other available methods such as GA, PSO and IPSO. Experimental results showed that the proposed method was superior to other methods in calculating the fit function.

Figure (1) shows a simple diagram to illustrate the whole idea of this paper.

Fig. 1. Steps of the proposed model

III. Proposed Model

K-means clustering is a widely used computationally efficient clustering method. Nevertheless, the k-means algorithm does not guarantee finding optimal solutions. This algorithm is also sensitive to the initial centers of the clusters. In addition, k-means suffers from the occurrence of empty clusters during different iterations. The k-means algorithm first assigns the initial values to the centers of the clusters in several iterations. In k-means, clusters are created in each iteration by assigning all data points to the nearest centers, and then the average of each cluster replaces the cluster centers. In this algorithm, the number of iterations is used as a stop measure, or the iterations continue until there is a change in the centers of the clusters. The proposed model uses the SCA for improving BES to find the centers of the clusters. The proposed model is a center-based approach. Thus, the main goal is to find the optimal cluster centers. Figure (2) shows the steps of the proposed model.

Fig. 2. Steps of the proposed model

Formulation of Clusters

There is a dataset that contains samples such as and number of sample dimensions, i.e., in data clustering. Each data sample is represented using , where represents the sample with the dimension. The data vector is defined as .

A clustering issue can be defined mathematically as an optimization problem. For a given number of clusters (or to be suitably chosen) K ∈ N, the objective is to minimize f (C). Cluster center vector C = (C1, C2, …, Ck), where D = {d1, d2, …, dn} is the dataset of samples to be grouped in K number of disjoint clusters C based on the similarity of each datum di(i = 1,2,…n) that is according to the selected objective function f (C).

The purpose of data clustering is to divide the dataset into K separate clusters as so that is defined as . Samples The data are assigned to the C cluster with the least distance.

In the proposed model, first, the Euclidean distance between each data sample and the cluster's center is calculated, and each sample is assigned to the nearest cluster. The fitness function is calculated based on the sum of the distances within the cluster between the data points and the center of the cluster to which they belong. The distance between the samples and the centers of the clusters is calculated according to Eq. (1).

(1)

As indicates the distance between the sample and the cluster. and represent the sample of the dataset (X) and k of the cluster (C), respectively. After calculating the distance between the samples and the centers of the clusters, the samples are assigned to the clusters with the minimum distance. The distance between the samples and the centers of the new clusters is determined in each iteration, and the samples are assigned to the closest clusters. The process is repeated until the optimal cluster centers, or termination conditions are found.

Pre-processing

If attributes with large values and variances are present in the dataset, then prevails on other attributes, leading to reduced clustering accuracy. One possible way to deal with this problem is to unify the data to ensure that each feature contributes equally to the distance. The preprocessing step is used to standardize the values of the dataset attributes. If the dataset is not in a particular range, it will reduce the accuracy. Normalization and standardization procedures are used to eliminate not just scale data, but also certain systemic biases that are inherent in the data. The biases arise from the interdependencies between qualities that may or may not have a normal distribution. Current normalizing approaches include min–max normalization, which treats each characteristic separately. In the proposed model, we perform the preprocessing operation based on Eq. (2). Therefore, by the unification method, based on the maximum, minimum data and all characteristics, the values of the samples can be transferred to the range [0, 1]. The parameter refers to the property of the instance.

(2)

Select Stage

In the select step, the eagles (agents) selects the best vector in the search space based on Eq (3). In the proposed model, the candidate solutions are defined as , where is the dimensions of the samples.

(3)

(4)

In Eq. (3), α is a parameter for controlling position changes defined according to Eq (4). So α is the average distance between the sample and the sample in a vector. In Eq. (4) is the number of samples in the vector. The parameter r is a random number that takes a value between 0 and 1. In the select step, the eagles select a point with help of information in the previous steps. The parameter indicates the search space that the eagles currently select based on the best position. Eagles randomly search all points close to the previous search space. The parameter indicates that these eagles have used all the information of the previous points, and the parameter contains all the vector points.

A solution is defined by a matrix (N´K) in which N is the number of instances in a dataset and K is the number of clusters. As shown in Eq. (5). A factor is represented as a matrix Such that k is the number of clusters and d is the dimension of the dataset. For example, if k = three and d = 5, then the length of the vector will be 15. Each row of the matrix represents the k center of the cluster of samples.

(5)

The agents will move towards optimization in a few iterations, and the factor population vector is updated in each iteration. The centers of the clusters are defined according to Eq. (6). Each center of the cluster indicates the position of the best eagle. Since denotes the center of belongs to the center of kth of the population, z is the dimensions of the dataset.

(6)

The fitness function is defined to evaluate the quality of each vector according to Eq. (7). The vector is selected as the optimal vector with the minimum result.

(7)

In Eq. (7), k and X represent the number of clusters and agents, respectively. represents the average of the cluster centers on the vector X. and are the centers of vectors, the number of which is determined by the number of clusters, and is the value of the elements of the vectors.

Hybrid Search Stage

In the search phase, the eagles search for the best points based on SCA operators in the selected search space and move in different directions in the spiral space to speed up their search. In this step, SCA is used to increase search efficiency. The exploration () and exploitation () operations are based on the SCA algorithm. The best position for diving is defined according to Eq. (8). Each factor selects the points that are closest to the other points. BES is a high-quality global optimum approach, however changes in the eagle population might result in the erasure of past information. SCA is employed to solve the aforementioned problem.

In Eq. (8), a takes a value between 5 and 10, used to determine the angle between the point search at the center point. R takes between 0.5 and 2 to determine the number of search cycles. The spiral space makes the points entirely in the eagle space and makes access to the points easier. The proposed model combines BES and SCA to benefit of efficient solution space search for data clustering. Each eagle in the population represents a full solution in the proposed model (a set of centers of the clusters).

Swooping Stage

During the swooping phase, the eagles move from the best position in the search space to the optimal point. All points also move to the best point. The diving stage is defined according to Eq. (9).

(8)
	;


	;

Evaluation Criteria

Evaluation functions are used to measure the overall quality of clusters. The most common estimation function used for clustering, which we use in this paper, is the Sum of Squares Error (SSE) function. The SSE is the square of the sum of the distances of all the instances of the data set to the center of the cluster in which they are located. The evaluation function of the sum of squares of error is defined according to Eq. (10). Where k is equal to the number of clusters and is the center of the j cluster [25].

(10)

Eq. (11) is used to calculate the standard deviation of the standard for clusters. Parameter c represents the number of clusters and is equal to the center of the cluster. If the standard deviation of a set of data is close to zero, it indicates that the data are close to each other and have little scatter; While a large standard deviation indicates significant data scatter.

(11)

The Error Rate (ER) equals the number of incorrect samples, divided by the total number of samples. The error rate is calculated according to Eq. (12) [26].

(12)

In Eq. (14), n represents the total number of samples, and and represent the datasets whose point is the member before and after clustering.

Best, Worst, and Mean are defined according to the following equations. BS is the best value obtained in each iteration. The best value in each iteration is calculated based on SSE.

(9)


	where

IV. Evaluation and Results

In this section, the details of the experiments and experimental obtained results from the algorithms in the MATLAB 2019 simulation environment are defined. In MATLAB environment, the datasets are read in Excel file format. According to the sensitivity analysis of the algorithm parameters, the best population size for eagles (BES) is 30, and the best value of the iteration parameter is 200 times. In this paper, to validate and evaluate the proposed model and compare its performance with other methods from 8 UCI datasets (https://archive.ics.uci.edu/ml/datasets.php) such as Glass, Vowel, CMC, Iris, Wine, Cancer, Seeds, Heart. These datasets are usually used as criteria for measuring the quality of the clustering process for algorithms.

Table I Characteristics of the datasets

	(13)
	(14)
	(15)

Sr. No.	Datasets	No. of classes	Categorical attributes	Size
D1	Glass	6	10	214
D2	Vowel	6	3	871
D3	CMC	3	10	1473
D4	Iris	3	4	150
D5	Wine	3	13	178
D6	Cancer	2	9	699
D7	Seeds	3	7	210
D8	Heart	2	13	270

In Table (2), the initialization of the parameters of the algorithms is performed.

Table II Initialization of parameters

Optimization of weighting-based approach to predict and deal with cold start of web recommender systems using cuckoo algorithm
Print Date : 2021-05-01
PEML-E: EEG eye state classification using ensembles and machine learning methods
Print Date : 2021-05-01
Scalable Fuzzy Decision Tree Induction Using Fast Data Partitioning and Incremental Approach for Large Dataset
Print Date : 2021-02-01
Alleviation of Cold Start in Movie Recommendation Systems using Sentiment Analysis of Multi-Modal Social Networks
Print Date : 2020-11-01
An Optimization K-Modes Clustering Algorithm with Elephant Herding Optimization Algorithm for Crime Clustering
Print Date : 2020-05-01
Prediction of Student Learning Styles using Data Mining Techniques
Print Date : 2020-05-01

The rights to this website are owned by the Raimag Press Management System.
Copyright © 2021-2025

Value	Parameters	Algorithms
200	Iterations	K-means
50	Population	FPA
1.5	l
1	g
0.5	P
2		CSA
0.2	AP
0.99	a
0.01	b
[0,1]	q
0.9		PSO
0.4
2	C1
2	C2
2	a	SCA
[0,1], [0,2p], [0,2]
2		BES
Rand [0,1]	r
1.5	R

Table (III) compares the proposed model with other algorithms on different datasets based on the objective function. In reviewing the results, the best cost is considered as the primary criterion for evaluation. The proposed model is repeated on each dataset with 200 iterations. If the standard deviation (SD) is less, it indicates that the proposed models are converged to an immediate and optimal answer each time it is executed, and it is clear that the answer obtained by the proposed model has a lower cost and SD in comparison with other models. The simulations result for the datasets are shown in Table (III). From this table, the SCA performs as good as the proposed model in terms of Best, Mean and Worst values. The proposed model generally outperforms other models shown in Table (III).

[1]

Table III Comparison of the proposed models with other algorithms on different datasets

datasets	Criterion	k-means	FPA	CSA	PSO	SCA	proposed model
D1	Best	215.74	213.12	209.64	217.47	208.08	202.15
	Mean	235.50	214.27	209.91	217.86	208.68	202.76
	worst	255.38	216.04	215.35	217.94	208.16	203.64
	SD	12.4710	4.6584	4.2654	4.3546	3.2640	1.1362
D2	Best	149422.26	149251.73	149006.47	149010.51	148465.05	148347.06
	Mean	159242.89	152103.49	150325.28	156235.04	148756.72	148645.23
	worst	160028.81	157125.65	156015.22	159410.67	149214.16	148991.45
	SD	916.0042	651.5935	452.7308	347.3562	96.7605	8.5214
D3	Best	5842.20	5789.63	5768.03	5713.26	5703.98	5632.20
	Mean	5893.60	5791.58	5869.47	5713.98	5705.96	5635.50
	worst	5906.43	5849.64	5897.94	5769.33	5778.24	5699.99
	SD	47.1625	35.5470	32.9321	15.5910	8.9551	1.6254
D4	Best	97.33	98.98	97.21	96.23	96.21	95.25
	Mean	106.05	117.19	98.84	96.95	97.23	96.38
	worst	120.45	115.65	101.02	103.54	97.89	95.52
	SD	14.6311	12.2451	3.1020	0.5312	0.5841	0.0072
D5	Best	16555.68	16312.53	16378.48	16302.96	16324.35	16295.29
	Mean	18061.00	16530.52	16421.07	16417.46	16376.35	16295.35
	worst	18563.12	16530.53	16497.25	16562.31	16425.58	16297.72
	SD	793.2101	32.6202	86.0843	54.4974	25.2108	5.8542
D6	Best	2999.19	2986.32	2984.45	2928.50	2974.96	2915.31
	Mean	3251.21	3147.46	3068.17	3075.04	2973.24	2932.31
	worst	3521.59	3227.43	3221.95	3218.82	2990.83	2942.31
	SD	251.1436	229.7340	230.1921	105.2087	12.1247	0.1325
D7	Best	587.31	315.72	316.48	311.68	312.52	309.42
	Mean	588.10	318.62	317.85	313.85	313.36	310.13
	worst	589.04	319.35	318.13	313.85	313.55	311.08
	SD	30.6524	18.7126	6.2013	24.4036	7.2154	1.0051
D8	Best	10681.44	10642.89	10644.32	10611.35	10629.99	10589.56
	Mean	10688.64	10634.34	10601.25	10623.07	10656.36	10572.56
	worst	10700.83	10689.70	10685.95	10669.32	10659.70	10621.56
	SD	8.3298	8.0047	7.5107	0.1711	5.0047	0.2031

Based on the simulation results in Table (3), the proposed model, compared to other methods, has the highest quality in providing solutions, including the best, worst, and average intra-cluster distance for dataset samples. In all cases, the proposed model results are better and more desirable than other methods for each dataset. The small value for SD involves that the proposed model in the data clustering process for each dataset can discover a near-optimal solution in many independent implementations and has high power and ability to converge to the optimal solution. According to the results, it is clear that the k-means algorithm could not find the optimal centers. In general, the proposed model is better than the other five algorithms. Especially, on six datasets (D1, D3, D5, D6, D7, and D8), it converged fast indeed and obtained the global minimum. It can be concluded that the proposed model is an efficient algorithm for data clustering.

Error Rate

Table (4) shows, the proposed model eventually achieves the lowest error compared to other algorithms. The proposed model works better for searching all problem areas. As each agent with different initial values, it discovers the problem space differently. This ensures that the algorithm does not get stuck in local minima and increases the search process. Both the SCA and the proposed model show superiority in terms of the quality of solutions compared with other investigated models.

Table IV Comparison of the proposed model with other algorithms on different datasets

Datasets	k-means	FPA	CSA	PSO	SCA	proposed model
Glass	37.71	45.68	43.12	39.62	36.08	31.25
Vowel	44.26	45.75	43.27	45.68	43.57	39.89
CMC	54.49	54.98	54.63	54.41	53.87	52.31
Iris	13.67	11.37	10.16	11.18	10.35	10.05
Wine	31.12	29.19	29.45	28.15	29.25	27.95
Cancer	4.08	6.37	5.27	5.11	4.02	4.06
Seeds	11.55	13.38	11.39	10.36	10.76	9.78
Heart	37.81	25.86	24.15	16.82	15.08	10.65

The error rate in the proposed model for the CMC, Wine, Seeds datasets are 52.31, 27.95, and 9.78, respectively. According to the obtained results, it can be stated that the proposed model is suitable for data clustering and has a high ability to find the centrality of clusters and discover similar samples. The error rate in the proposed model is lower than other algorithms, and the error rate is lower than others.

Execution Time

In this section, the execution time (seconds) of the algorithms is examined. According to the results in Table (V), it is clear that the proposed model has more time. But the results showed that the proposed model has a lower error rate. Also, the fitness of the proposed model was better in comparison with other algorithms. The SCA algorithm has a high ability to detect answers in less time.

Table V Comparison of the proposed model with other algorithms based on execution time (seconds)

Datasets	k-means	FPA	CSA	PSO	SCA	proposed model
Glass	35	27	26	25	24	28
Vowel	38	27	25	27	25	29
CMC	41	28	26	27	26	29
Iris	17	11	8	9	7	14
Wine	28	24	19	20	18	25
Cancer	12	15	10	10	7	11
Seeds	12	14	9	10	7	10
Heart	15	14	11	11	7	10

V. Conclusions And Future Works

The k-means algorithm is sensitive to initialization to cluster centers; Thus, improper quantification leads to a slow convergence rate or non-optimal convergence. In this paper, a model for data clustering using a combination of BES and SCA was proposed. In the proposed model, the centers of the clusters were determined using eagle search among the data points. Evaluation of eight different datasets showed that the proposed model had a lower error rate than the FPA, CSA, PSO, and SCA. The proposed model has good convergence in achieving the optimal solution. In addition to maintaining accuracy in the proposed model, the time to reach the optimal solution was low, and the clusters were updated at high speed. The performances with 200 repetitions indicate the optimality of the proposed model on all of the datasets. Also, the number of generations showed that the proposed model had minor errors compared to other algorithms.

Due to the breadth and scope of clustering applications, it is impossible to create a comprehensive general algorithm for all applications. In future studies, we will try to use other population-based meta-heuristic algorithms and introduce new features for the BES, including parallelization of agent activity, improved eagle position, and improved swooping mechanism, and the possibility of linking agents and inspiring clustering methods different types of samples in high volume will be clustered.

References

[1] Ilango S.S., Vimal S., Kaliappan M., and Subbulakshmi P., 2019. Optimization using Artificial Bee Colony based clustering approach for big data, Cluster Computing. Vol. 22, No. 5, pp. 12169-12177.

[2] kumar Y., Sahoo G., 2017. A two-step artificial bee colony algorithm for clustering, Neural Computing and Applications, Vol. 28, No. 3, pp. 537-551.

[3] Singh H., Kumar Y., and Kumar S., 2019. A new meta-heuristic algorithm based on chemical reactions for partitional clustering problems, Evolutionary Intelligence. Vol. 12, No. 2, pp. 241-252.

[4] Sharma M., Chhabra J.K., 2021. An efficient hybrid PSO polygamous crossover-based clustering algorithm. Evolutionary Intelligence, Vol. 14, No. 3, pp. 1213-1231.

[5] Ezugwu A.E., Shukla A.K., Agbaje M.B., Oyelade O.N., Jose-Garcia A., and Agushaka J.O., 2021. Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature. Neural Computing and Applications, Vol. 33, No. 11, pp. 6247-6306.

[6] Hancer E., Xue B., Zhang M., 2020. A survey on feature selection approaches for clustering, Artificial Intelligence Review, Vol. 53, No. 6, pp. 4519-4545.

[7] Alsattar H.A., Zaidan A.A., and Zaidan B.B., 2020. Novel meta-heuristic bald eagle search optimisation algorithm, Artificial Intelligence Review, Vol. 53, No. 3, pp. 2237-2264.

[8] Mirjalili S., 2016. SCA: A Sine Cosine Algorithm for solving optimization problems, Knowledge-Based Systems, Vol. 96, No. 1, pp. 120-133.

[9] Gharehchopogh F.S. and Gholizadeh H., 2019. A comprehensive survey: Whale Optimization Algorithm and its applications. Swarm and Evolutionary Computation, Vol. 48, No. 1, pp. 1-24.

[10] Shayanfar H. and Gharehchopogh F.S., 2018. Farmland fertility: A new metaheuristic algorithm for solving continuous optimization problems. Applied Soft Computing, Vol. 71, No. 1, pp. 728-746.

[11] Abdollahzadeh B., Gharehchopogh F.S., and Mirjalili S., 2021. African vultures optimization algorithm: A new nature-inspired metaheuristic algorithm for global optimization problems. Computers & Industrial Engineering, Vol. 158, No. 1, pp. 107408.

[12] Goldanloo M.J. and Gharehchopogh F.S., 2021. A hybrid OBL-based firefly algorithm with symbiotic organisms search algorithm for solving continuous optimization problems. The Journal of Supercomputing, Vol. 18, No. 1. pp. 1-23.

[13] Ghafori S. and Gharehchopogh F.S., 2021. Advances in Spotted Hyena Optimizer: A Comprehensive Survey, Archives of Computational Methods in Engineering. Vol. 10, No. 1, pp. 1-26.

[14] Rahnema, N., & Gharehchopogh, F. S., 2020. An improved artificial bee colony algorithm based on whale optimization algorithm for data clustering. Multimedia Tools and Applications, Vol. 79, No. 43, pp.32169-32194.

[15] Ghany K.K.A., AbdelAziz A.M., Soliman T.H.A., Sewisy A.A.E.-M., 2020. A hybrid modified step Whale Optimization Algorithm with Tabu Search for data clustering. Journal of King Saud University - Computer and Information Sciences, Vol. 1, No. 1, pp. 1-8.

[16] Zhang X., Lin Q., Mao W., Liu S., Dou Z., and Liu G., 2021. Hybrid Particle Swarm and Grey Wolf Optimizer and its application to clustering optimization. Applied Soft Computing, Vol. 101, No. 1, pp. 1-23.

[17] Jafari Jabal Kandi R. and Gharehchopogh F.S., 2020. An improved opposition-based Crow Search Algorithm for Data Clustering. Journal of Advances in Computer Research, Vol. 11, No. 4, pp. 1-22.

[18] Damya N. and Gharehchopogh F.S., 2020. An Improved Bat Algorithm based on Whale Optimization Algorithm for Data Clustering. Journal of Advances in Computer Engineering and Technology, Vol. 6, No. 4, pp. 201-210.

[19] Kushwaha N., Pant M., Kant S., and Jain V.K., 2018. Magnetic optimization algorithm for data clustering. Pattern Recognition Letters, Vol. 115, No. 1, pp. 59-65.

[20] Jensi R., Jiji GW,2016. An improved krill herd algorithm with global exploration capability for solving numerical function optimization problems and its application to data clustering, Applied Soft Computing, Vol. 46, No. 1, pp. 230-245.

[21] Liu C., Wang C., Hu J., Ye Z., 2017. Improved K-means algorithm based on hybrid rice optimization algorithm, in 2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS).

[22] Shang R., Zhang W., Li F., Jiao L., Stolkin R., 2019. Multi-objective artificial immune algorithm for fuzzy clustering based on multiple kernels. Swarm and Evolutionary Computation, Vol. 50, No. 1, pp. 100485.

[23] Su Z.G., Wang P.H., Shen J., Li Y.G., Zhang Y.F., Hu E.J., 2012. Automatic fuzzy partitioning approach using Variable string length Artificial Bee Colony (VABC) algorithm. Applied Soft Computing. Vol. 12, No. 11, pp. 3421-3441.

[24] Nayak J., Naik B., Kanungo D.P., Behera H.S., 2018. A hybrid elicits teaching learning-based optimization with fuzzy c-means (ETLBO-FCM) algorithm for data clustering. Ain Shams Engineering Journal, Vol. 9, No. 3, pp. 379-393.

[25] Niknam T., Olamaei J., Amiri B., 2008. A Hybrid Evolutionary Algorithm Based on ACO and SA for Cluster Analysis. Journal of Applied Sciences, Vol. 8, No. 1, pp. 2695-2702.

[26] Kao Y.-T., Zahara E., Kao I.W., 2008. A hybridized approach to data clustering, Expert Systems with Applications. Vol. 34, No. 3, pp. 1754-1762.

Share To

Article Url

A New Model-based Bald Eagle Search Algorithm with Sine Cosine Algorithm for Data Clustering