Parallel Shared Hidden Layers Auto-encoder as a Cross-Corpus Transfer Learning Approach for Unsupervised Persian Speech Emotion Recognition

Pourebrahim, Yousef; Razzazi, Farbod; Sameti, Hossein

رقم المقالة : SPRE-2104-1139 (R1) زيارة : 444 الصفحة: 83 - 106

20.1001.1.25887327.2021.5.4.6.1

نوع المخطوط: ابحاث

Parallel Shared Hidden Layers Auto-encoder as a Cross-Corpus Transfer Learning Approach for Unsupervised Persian Speech Emotion Recognition

الموضوعات : Communication

Yousef Pourebrahim ¹ , Farbod Razzazi ² , Hossein Sameti ³

1 - Department of Electrical and Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
2 - Department of Electrical and Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
3 - Speech Processing Laboratory, Department of Computer Engineering, Sharif University of Technology, Tehran, Iran

تاريخ الإرسال : 12 السبت , رمضان, 1442 تاريخ التأكيد : 02 الجمعة , شوال, 1442 تاريخ الإصدار : 26 الأربعاء , ربيع الثاني, 1443

الکلمات المفتاحية: unsupervised classification, Deep neural networks, transfer learning, Emotional Speech Recognition,

ملخص المقالة :

Detecting emotions from speech is one of the challenging topics in speech signal processing, especially in low resource languages. Extracting common features between the training and testing set, using unsupervised method, can solve the inconsistency difficulty between training and test data. In this study, a new auto-encoder based structure is proposed as a new unsupervised method for domain adaptation. To this end, the proposed structure is made of shared encoders to learn common feature representations, shared across the source and the target domain datasets to minimize the discrepancy between them. In order to evaluate the performance of the proposed method, five generally available databases in different languages were used as training and testing datasets. Results on various scenarios demonstrated that the proposed method improves the classification performance significantly compared to the baseline and state of the art unsupervised domain adaptation methods for emotional speech recognition. As an example, the proposed method improved the emotion recognition rate in Persian emotional speech dataset (PESD) by 8% compared to cross corpus training when the source training set is EMOVO.

المصادر:

شارک

عنوان URL للمقالة

Parallel Shared Hidden Layers Auto-encoder as a Cross-Corpus Transfer Learning Approach for Unsupervised Persian Speech Emotion Recognition

سند

الروابط

المراكز ذات الصلة

دعامة

الصفحات الرسمية