روشی نوین جهت خوشه بندی داده مبتنی بر ترکیب الگوریتم‌های بهینه‌سازی ژنتیک و کرم شب‌تاب

محورهای موضوعی : پردازش چند رسانه ای، سیستمهای ارتباطی، سیستمهای هوشمند

1 - دانشگاه آزاد اسلامی واحد علوم تحقیقات، دانشکده فنی مهندسی، گروه مهندسی کامپیوتر، تهران، ایران
2 - دانشگاه آزاد اسلامی واحد دزفول، دانشکده فنی مهندسی، گروه مهندسی پزشکی دزفول، ایران

تاریخ دریافت : 1400/03/29 تاریخ پذیرش : 1401/05/31 تاریخ انتشار : 1400/10/01

کلید واژه: الگوریتم ژنتیکی کرم شب تاب, الگوریتم کرم شب تاب, الگوریتم ژنتیک, داده کاوی, خوشه بندی k-means,

چکیده مقاله :

یکی ازمسائل مهم درداده‌کاوی خوشه‌بندی است که بدون هدف ازپیش تعیین شده‌ای داده‌ها را بر اساس شباهت درون خوشه‌ها تقسیم‌بندی می‌کند. از روش‌های متداول خوشه‌بندی الگوریتم k-means است که بادریافت ورودی، داده‌هارابه k خوشه تقسیم‌بندی می‌کند. یکی ازمعایب این روش حساسیت به شرایط اولیه است که منجربه کاهش دقت درخوشه‌بندی می‌شود. از روش‌های بهبود عملکرد k-means می‌توان استفاده ازالگوریتم‌های فراابتکاری را نام برد. در این پژوهش به دو روش بهینه‌سازی ژنتیک و کرم شب‌تاب پرداخته شده است و الگوریتم جدیدی تحت عنوان الگوریتم ژنتیکی کرم‌شب‌تاب جهت بهینه‌سازی خوشه‌بندی k-means ارائه شده است. الگوریتم کرم‌شب‌تاب از الگوریتم‌های هوش جمعی است که از ویژگی نورچشمک زن کرم‌شب‌تاب الهام گرفته است و الگوریتم ژنتیک نوعی از الگوریتم‌های فراابتکاری است که از تکنیک-های زیست‌شناسی مانند وراثت و جهش استفاده می‌کند. در الگوریتم k-means برای اینکه مراکز خوشه به صورت تصادفی انتخاب می شوند، خوشه‌بندی دقت لازم را ندارد. با استفاده از الگوریتم‌های فراابتکاری سعی در بدست آوردن مراکز دقیق خوشه‌ها داشته و در نتیجه آن، خوشه-بندی صحیح می‌باشیم. در روش پیشنهادی، ابتدا الگوریتم k-means را روی داده‌های ورودی اجراکرده و خوشه‌بندی انجام می‌شود. سپس مضربی از مراکز خوشه که دراین الگوریتم بدست آمده است را به عنوان حد پایین و حد بالای الگوریتم پیشنهادی استفاده می‌کنیم. جمعیت اولیه به صورت تصادفی بین حد پایین و حد بالا تولید می‌شود. در حلقه اصلی الگوریتم جمعیت را به دو دسته جمعیت مساوی تقسیم می نماییم، بر روی دسته اول الگوریتم ژنتیک را اجرا می‌کنیم، بر روی دسته دوم بر اساس الگوریتم کرم‌شب‌تاب موقعیت‌های جدید را بدست می‌آوریم. حال جمعیت قبلی و جمعیت جدید بدست امده از الگوریتم ژنتیک و جمعیت جدید بدست امده از الگوریتم کرم‌شب‌تاب را تلفیق کرده وآن‌ها را از خوب به بد مرتب می‌کنیم و به تعداد مورد نیاز از آن‌ها را انتخاب و به ابتدای حلقه می‌رویم. این فرایند را تا برقراری شرط توقف ادامه می‌دهیم. درپایان الگوریتم k-means، الگوریتم کرم‌ شب‌تاب، الگوریتم ژنتیک و الگوریتم پیشنهادی بر روی سه مجموعه داده اعمال شده و نتایج مورد مقایسه قرار گرفته است.نتایج شبیه‌سازی نشان می‌دهد که الگوریتم ژنتیکی کرم‌شب‌تاب عملکرد بهتری در مقایسه با سایر روش‌ها داشته است.

چکیده انگلیسی:

Introduction: With the progress of technology and increasing the volume of data in databases, the demand for fast and accurate discovery and extraction of databases has increased. Clustering is one of the data mining approaches that is proposed to analyze and interpret data by exploring the structures using similarities or differences. One of the most widely used clustering methods is the k-means. In this algorithm, cluster centers are randomly selected and each object is assigned to a cluster that has maximum similarity to the center of that cluster. Therefore, this algorithm is not suitable for outlier data since this data easily changes centers and may produce undesirable results. Therefore, by using optimization methods to find the best cluster centers, the performance of this algorithm can be significantly improved. The idea of combining firefly and genetics algorithms to optimize clustering accuracy is an innovation that has not been used before.Method: In order to optimize k-means clustering, in this paper, the combined method of genetic algorithm and firefly worm is introduced as the firefly genetic algorithm.Findings: The proposed algorithm is evaluated using three well-known datasets, namely, Breast Cancer, Iris, and Glass. It is clear from the results that the proposed algorithm provides better results in all three datasets. The results confirm that the distance between clusters is much less than the compared approaches.Discussion and Conclusion: The most important issue in clustering is to correctly determine the cluster centers. There are a variety of methods and algorithms that performs clustering with different performance. In this paper, based on firefly metaheuristic algorithms and genetic algorithms a new method has been proposed for data clustering. Our main focus in this study was on two determining factors, namely the distance within the data cluster (distance of each data to the center of the cluster) and the distance that the headers have from each other (maximum distance between the centers of the clusters). In the k-means algorithm, clustering is not accurate since the cluster centers are selected randomly. Employing firefly algorithms and genetics, we try to obtain more accurate centers of the clusters and, as a result, correct clustering.

منابع و مأخذ:

[1] Yaghini, M., and Ghazanfari, N. "Tabu-KM: a hybrid clustering algorithm based on tabu search approach." International Journal of Industrial Engineering & Production Research 21, no. 2 (2010)

[2] Jiawei, H., Kamber, M., Han, J., Kamber, M. and Pei, J. "Data Mining: Concepts and Techniques Elsevier." (2006).

[3] Taber, R. "Clustering (Xu, R. and Wunsch II, DC; 2009) [Book review]." IEEE Computational Intelligence Magazine 4, no. 3 (2009): 92-95.

[4] Berzal, F. and Matín, N. "Data mining: concepts and techniques by Jiawei Han and Micheline Kamber." ACM Sigmod Record 31, no. 2 (2002): 66-68.

[5] Jain, A. K., and Dubes, R. C. Jain, Anil K., and Richard C. Dubes. "Algorithms for clustering data." Prentice-Hall, Inc., 1988.

[6] Jain, A.K., "Data clustering: 50 years beyond K-means." Pattern recognition letters 31, no. 8 (2010): 651-666.

[7] Hassanzadeh, T. and Meybodi, M.R. "A new hybrid approach for data clustering using firefly algorithm and K-means." In The 16th CSI international symposium on artificial intelligence and signal processing (AISP 2012), pp. 007-011. IEEE, 2012.

[8] Yang, X. S. "Firefly algorithms for multimodal optimization." In International symposium on stochastic algorithms, pp. 169-178. Springer, Berlin, Heidelberg, 2009.

[9] Moscato, P. "On evolution, search, optimization, genetic algorithms and martial arts: Towards memetic algorithms." Caltech concurrent computation program, C3P Report 826 (1989): 1989.

[10] Dianati, M., Song, I. and Treiber, M. An introduction to genetic algorithms and evolution strategies. Technical report, University of Waterloo, Ontario, N2L 3G1, Canada, 2002.

[11] Wahid, F., Ghazali, R. & Ismail, L.H. Improved Firefly Algorithm Based on Genetic Algorithm Operators for Energy Efficiency in Smart Buildings. Arab J Sci Eng 44, 4027–4047 (2019).

[12] Mahshwar, Keshva Kaushik, Vikram Arora. A Hybrid Data Clustering Using Firefly Algorithm Based Improved Genetic Algorithms. Sciencedirect. Procedia Computer Science 58 (2015) 249-256.

[13] M A El-Shorbagy, Adel M El-Refaey, A hybrid genetic–firefly algorithm for engineering design problems, Journal of Computational Design and Engineering, Volume 9, Issue 2, April 2022, Pages 706-730,

[14] Mustafa Servet Kiran, Ahmet Babalik. Improved Artificial Bee Colony Algorithm for Continuous Optimization Problems. Journal of Computer and Communications, 2014, 2, 108-116.

[15] Abdullah, A., Deris, S., Mohamad, M.S., Hashim, S.Z.M. (2012). A New Hybrid Firefly Algorithm for Complex and Nonlinear Problem. In: Omatu, S., De Paz Santana, J., González, S., Molina, J., Bernardos, A., Rodríguez, J. (eds) Distributed Computing and Artificial Intelligence. Advances in Intelligent and Soft Computing, vol 151. Springer, Berlin, Heidelberg.

[16] Hassanzadeh. t, meybodi.m, “A new hybrid Approach for Data clustering using firefly Algorithm and k-means”

[17] J. senthilnath, S.N.omkar, “clustering using firefly algorithm:performance study”,swarm and Evolutionary Computation, volume1,ISSue3, pp 164-171,September 2011.

[18] Hassanzadeh,Tahereh, Meybodi ,Mohammad Reza, A New Hybrid Approach For Data Clustering Using Firefly Algorithm And K-Means, The 16th CSI International Symposium On Artificial Intelligence And Signal Processing (AISP), 2012,007-011.

[19] Binlu, Fangyuan Ju, An Optimized Genetic K-Means Clustering Algorithm, Computer Science and Information Processing (CSIP),2012,1296-1299.

_||_

مقالات مرتبط

Adaptive-PGRP: الگوریتم مسیریابی در شبکه‌های VANET بر اساس الگوریتم PGRP با ارسال تطبیقی پیام های Hello
تاریخ چاپ : 1402/10/01
بررسی رابطه ارزیابی محصولات و پذیرش اعتماد توسط مصرف کننده بر قصد خرید مجدد در محیط تجارت الکترونیک (مطالعه موردی: سایت‌ دیجی کالا)
تاریخ چاپ : 1401/07/01
تشخیص بیماری پارکینسون با استفاده از تحلیل سیگنال‌های الکتروانسفالوگرام مبتنی بر تبدیل والش هادامارد
تاریخ چاپ : 1400/04/01
ارتقای امنیت اینترنت اشیا در شبکه زیگبی با استفاده از الگوریتم AES256
تاریخ چاپ : 1399/10/01
بررسی نقش قابلیتهای رسانه های نوین(پلتفرم: اینستاگرام)، بر تجارت الکترونیک(قصد خرید پوشاک زنانه) با توجه به نقش نگرش برند
تاریخ چاپ : 1402/07/01
افزایش دقت شناسایی جوامع همپوشان با استفاده از وزن‌دهی یال‌ها
تاریخ چاپ : 1399/07/01

اشتراک گذاری

آدرس مقاله

روشی نوین جهت خوشه بندی داده مبتنی بر ترکیب الگوریتم‌های بهینه‌سازی ژنتیک و کرم شب‌تاب