تشخيص خودکار گفتار رقمی با استفاده از شبکه عصبی اسپايکينگ عميق بر اساس وزن دهی فازی
محورهای موضوعی : مهندسی کامپیوتر و فناوری اطلاعاتملیکا حامیان 1 , کریم فایز 2 * , سهیلا نظری 3 , ملیحه ثابتی 4
1 - دانشجوی دکتری
2 - گروه مهندسی کامپیوتر، واحد تهران شمال، دانشگاه آزاد اسلامی، تهران، ایران
3 - گروه مهندسی کامپیوتر، واحد تهران شمال، دانشگاه آزاد اسلامی، تهران، ایران
4 - Department of Computer Engineering, North Tehran Branch, Islamic Azad University, Tehran, Iran
کلید واژه: سیستم تشخیص ارقام, شبکه عصبی اسپایکینگ, سیستم وزن¬دهی فازی,
چکیده مقاله :
علیرغم پیشرفتهای انجام شده در طراحی شبکههای عصبی اسپایکینگ، آموزش این سیستمها برای طبقهبندی و کاربردهای هوش مصنوعی از چالشهای پیشرو برای طراحی آنهاست. در این مقاله ما یادگیری نظارتشده را در شبکههای عصبی اسپایکی برای مساله تشخیص و طبقهبندی رقم از روی سیگنال های گفتار، بررسی کردهایم. در این روش، قانون یادگیری سیستم وزندهی فازی با انعطافپذیری وابسته به زمان اسپایک ادغام میشوند. قانون انعطافپذیری وابسته به زمان اسپایک ترکیب شده با سیستم وزندهی فازی، توزیع وزن تصادفی را ایجاد میکند که در آن محدوده پنجره انعطافپذیری وابسته به زمان اسپایک کنترل میشود. شبکه عصبی اسپایکینگ از یک مجموعه نورون آموزشی با وزندهی فازی برای کاهش تعداد وزنهای هر نورون، در مرحله آموزش استفاده میکند که در آن دادههای مرتبط با تمام کلاسها به این نورونها جهت تعیین وزنهای آموزش و تخمین آستانه با کمک الگوریتم اسب وحشی، اعمال میشود. سپس این قانون وزنها، به نورونهای لایههای مختلف داده میشوند تا شباهتها را در ویژگیهای استخراج شده در بین کلاسها به عنوان تابع هدف، منعکس نماید. نتایج روش پیشنهادی، دقت طبقهبندی 17/98% در پایگاه داده آزمایشی TIDIGITS را نشان میدهد.
Despite the progress made in the design of spiking neural networks (SNN), training these systems for classification and artificial intelligence applications is one of the upcoming challenges for their design. In this paper, we have investigated supervised learning in SNNs for the problem of digit recognition and classification from speech signals. SNN training is done using fuzzy logic. In this method, the learning rule integrates Fuzzy Weighting System (FWS) with Spike Time Dependent Flexibility (STDP). SNN uses a set of training neurons with fuzzy weighting to reduce the number of weights of each neuron in the training phase, in which the data related to all classes are fed to these neurons to determine the training weights and threshold estimation with the help of the Wild Horse Algorithm (WHO). Then, these rule weights are given to the neurons of different layers to reflect the similarities in the extracted features among the classes as an objective function. A case study has been carried out on a set of audio signal data for digit classification. Our network achieved a classification accuracy of 98.17% on the TIDIGITS test database.
[1] R. P. Lippmann, “Speech recognition by machines and humans,” Speech Communication, vol. 22, no. 1, pp. 1–15, Jul. 1997, doi: https://doi.org/10.1016/s0167-6393(97)00021-6.
[2] Y. SUH and H. KIM, “Cepstral Domain Feature Extraction Utilizing Entropic Distance-Based Filterbank,” IEICE Transactions on Information and Systems, vol. E93-D, no. 2, pp. 392–394, 2010, doi: https://doi.org/10.1587/transinf.e93.d.392.
[3] L. Deng, "Processing of acoustic signals in a cochlear model incorporating laterally coupled suppressive elements," Neural Networks, vol. 5, pp. 19-34, 1992.
[4] G. Raut, A. Biasizzo, N. Dhakad, N. Gupta, G. Papa, and S. K. Vishvakarma, “Data multiplexed and hardware reused architecture for deep neural network accelerator,” Neurocomputing, vol. 486, pp. 147–159, May 2022, doi: https://doi.org/10.1016/j.neucom.2021.11.018.
[5] Wachirawit Ponghiran and K. Roy, “Spiking Neural Networks with Improved Inherent Recurrence Dynamics for Sequential Learning,” Proceedings of the ... AAAI Conference on Artificial Intelligence, vol. 36, no. 7, pp. 8001–8008, Jun. 2022, doi: https://doi.org/10.1609/aaai.v36i7.20771.
[6] F. I. Arce-McShane, B. J. Sessle, C. F. Ross, and N. G. Hatsopoulos, “Primary sensorimotor cortex exhibits complex dependencies of spike-field coherence on neuronal firing rates, field power, and behavior,” Journal of Neurophysiology, vol. 120, no. 1, pp. 226–238, Jul. 2018, doi: https://doi.org/10.1152/jn.00037.2018.
[7] S. Navlakha, Z. Bar-Joseph, and A. L. Barth, “Network Design and the Brain,” Trends in Cognitive Sciences, vol. 22, no. 1, pp. 64–78, Jan. 2018, doi: https://doi.org/10.1016/j.tics.2017.09.012.
[8] D. Hassabis, D. Kumaran, C. Summerfield, and M. Botvinick, “Neuroscience-Inspired Artificial Intelligence,” Neuron, vol. 95, no. 2, pp. 245–258, Jul. 2017, doi: https://doi.org/10.1016/j.neuron.2017.06.011.
[9] G. Deco, V. K. Jirsa, P. A. Robinson, M. Breakspear, and K. Friston, “The Dynamic Brain: From Spiking Neurons to Neural Masses and Cortical Fields,” PLoS Computational Biology, vol. 4, no. 8, p. e1000092, Aug. 2008, doi: https://doi.org/10.1371/journal.pcbi.1000092.
[10]Y. Bengio and Y. LeCun, "Scaling learning algorithms towards AI," Large-scale kernel machines, vol. 34, pp. 1-41, 2007.
[11] N. Vogt, “Machine learning in neuroscience,” Nature Methods, vol. 15, no. 1, pp. 33–33, Jan. 2018, doi: https://doi.org/10.1038/nmeth.4549.
[12] J. Wu, C. Xu, D. Zhou, H. Li, and K. C. Tan, “Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks,” arXiv.org, 2020. https://arxiv.org/abs/2007.01204 (accessed Sep. 24, 2024).
[13] A. K. Mukhopadhyay, M. P. Naligala, D. L. Duggisetty, I. Chakrabarti, and M. Sharad, “Acoustic scene analysis using analog spiking neural network,” Neuromorphic Computing and Engineering, vol. 2, no. 4, p. 044003, Oct. 2022, doi: https://doi.org/10.1088/2634-4386/ac90e5.
[14] S. Y. A. Yarga, J. Rouat, and S. Wood, “Efficient Spike Encoding Algorithms for Neuromorphic Speech Recognition,” Proceedings of the International Conference on Neuromorphic Systems 2022, Jul. 2022, doi: https://doi.org/10.1145/3546790.3546803.
[15]Juan Pedro Dominguez-Morales et al., “Deep Spiking Neural Network model for time-variant signals classification: a real-time speech recognition approach,” Jul. 2018, doi: https://doi.org/10.1109/ijcnn.2018.8489381.
[16] F. Khatami and M. A. Escabí, “Spiking network optimized for word recognition in noise predicts auditory system hierarchy,” PLOS Computational Biology, vol. 16, no. 6, p. e1007558, Jun. 2020, doi: https://doi.org/10.1371/journal.pcbi.1007558.
[17] Z. Pan, Y. Chua, J. Wu, M. Zhang, H. Li, and E. Ambikairajah, “An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networks,” arXiv.org, 2019. https://arxiv.org/abs/1909.01302 (accessed Sep. 24, 2024).
[18] D. Roy, P. Panda, and K. Roy, “Synthesizing Images from Spatio-Temporal Representations using Spike-based Backpropagation,” arXiv.org, 2019. https://arxiv.org/abs/1906.08861 (accessed Sep. 24, 2024).
[19]J. Wu, Y. Chua, M. Zhang, H. Li, and K. C. Tan, “A Spiking Neural Network Framework for Robust Sound Classification,” Frontiers in Neuroscience, vol. 12, Nov. 2018, doi: https://doi.org/10.3389/fnins.2018.00836.
[20] J. Wu, E. Yilmaz, M. Zhang, H. Li, and K. C. Tan, “Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition,” arXiv.org, 2019. https://arxiv.org/abs/1911.08373 (accessed Sep. 24, 2024).
[21]“Kaur, S.A.A. (2012) Modified Edge Detection Technique Using Fuzzy Inference System. International Journal of Computer Applications, 44, 9-12. - References - Scientific Research Publishing,” Scirp.org, 2016. https://www.scirp.org/reference/referencespapers?referenceid=1746187
[22] Khalid Anindyaguna, Noor Cholis Basjaruddin, and Didin Saefudin, “Overtaking assistant system (OAS) with fuzzy logic method using camera sensor,” Jan. 2016, doi: https://doi.org/10.1109/icimece.2016.7910420.
[23] F. Jabr, “John A. Long - Publications List,” Publicationslist.org, vol. 14, no. 6, 2021.
[24] E. Zorarpacı and S. A. Özel, “A hybrid approach of differential evolution and artificial bee colony for feature selection,” Expert Systems with Applications, vol. 62, pp. 91–103, Nov. 2016, doi: https://doi.org/10.1016/j.eswa.2016.06.004.
[25] M. H. Ali, S. Kamel, M. H. Hassan, M. Tostado-Véliz, and H. M. Zawbaa, “An improved wild horse optimization algorithm for reliability based optimal DG planning of radial distribution networks,” Energy Reports, vol. 8, pp. 582–604, Nov. 2022, doi: https://doi.org/10.1016/j.egyr.2021.12.023.
[26]“Dynamical Systems in Neuroscience,” MIT Press, Jun. 18, 2024. https://mitpress.mit.edu/9780262514200/dynamical-systems-in-neuroscience/ (accessed Sep. 24, 2024).
[27] F. Ponulak and A. Kasinski, “Introduction to spiking neural networks: Information processing, learning and applications,” Acta Neurobiologiae Experimentalis, vol. 71, no. 4, pp. 409–433, 2011, Available: https://pubmed.ncbi.nlm.nih.gov/22237491/
[28] V. Terrier, “Language Recognition by Cellular Automata,” Handbook of Natural Computing, pp. 123–158, 2012, doi: https://doi.org/10.1007/978-3-540-92910-9_4.
[29] https://github.com/Jakobovski/free-spoken-digit-dataset/blob/master.
[30]https://www.kaggle.com/datasets/jackvial/freespokendigitsdataset.
[31] J. Wu, E. Yilmaz, M. Zhang, H. Li, and K. C. Tan, “Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition,” arXiv.org, 2019. https://arxiv.org/abs/1911.08373
[32] A. Pitti, Mathias Quoy, C. Lavandier, and Sofiane Boucenna, “Gated spiking neural network using Iterative Free-Energy Optimization and rank-order coding for structure learning in memory sequences (INFERNO GATE),” Neural Networks, vol. 121, pp. 242–258, Jan. 2020, doi: https://doi.org/10.1016/j.neunet.2019.09.023.
[33] K. Aizawa, Y. Nakamura, and Shin’ichi Satoh, Advances in Multimedia Information Processing - PCM 2004. Springer Science+Business Media, 2005. doi: https://doi.org/10.1007/b104117.