A Novel Approach in Text-Independent Speaker Recognition in Noisy Environment
Subject Areas : Signal and systems processingNona Heydari Esfahani 1 , Hamid Mahmoodian 2
1 - MSc – Persian Foolal Company, Isfahan
2 - Assistant Professor - Department of Electrical Engineering, Najafabad Branch, Islamic Azad University
Keywords: Shannon Entropy, MLP, Speaker identification, MFCC coefficients, pitch ferequency, formants,
Abstract :
In this paper, robust text-independent speaker recognition is taken into consideration. The proposed method performs on manual silence-removed utterances that are segmented into smaller speech units containing few phones and at least one vowel. The segments are basic units for long-term feature extraction. Sub-band entropy is directly extracted in each segment. A robust vowel detection method is then applied on each segment to separate a high energy vowel that is used as unit for pitch frequency and formant extraction. By applying a clustering technique, extracted short-term features namely MFCC coefficients are combined with long term features. Experiments using MLP classifier show that the average speaker accuracy recognition rate is 97.33% for clean speech and 61.33% in noisy environment for -2db SNR, that shows improvement compared to other conventional methods.
[1] R. ShanthaSelvaKumari, S. SelvaNidhyananthan, G. Anand, "Fused Mel feature sets based text-independent speaker identification using Gaussian mixture model", Procedia Engineering, Vol. 30, pp. 319-326, 2012.
[2] K. Daqrouq, K.Y. Al Azzawi, "Average framing linear prediction coding with wavelet transform for text-independent speaker identification system", Computers & Electrical Engineering, Vol. 38, No. 6, pp. 1467-1479, Nov. 2012.
[3] A. Shafik, S.M. Elhalafawy, S.M. Diab, B.M. Sallam, F.E. Abd El-samie, "A wavelet based approach for speaker identification from degraded speech", International Journal of Communication Networks and Information Security (IJCNIS), Vol. 1, No. 3, Dec. 2009.
[4] M.I. Abdalla, S.A. Hanaa, "Wavelet-based mel-frequency cepstral coefficients for speaker identification using hidden markov models", JOURNAL OF TELECOMMUNICATIONS, Vol. 1, No 2, March 2010.
[5] K. Daqrouq, "Wavelet entropy and neural network for text-independent speaker identification", Engineering Applications of Artificial Intelligence, Vol. 24, No 5, pp. 796–802, Aug. 2011.
[6] Md. Murad Hossain, B. Ahmed, M. Asrafi, "A real time speaker identification using artificial neural network", 10th international conference on computer and information technology, iccit, pp.1-5, 27-29 Dec. 2007.
[7] E. Avci, "A new optimum feature extraction and classification method for speaker recognition: GWPNN ", Expert Systems with Applications, Vol. 32, No. 2, pp. 485–498, Feb. 2007.
[8] H. Harb, C. Liming, "Gender identification using a general audio classifier", Proceeding of the IEEE/ICME, Vol. 2, pp. II-733-736, July 2003.
[9] H. Harb, L. Chen, "Voice-based gender identification in multimedia applications", Journal of Intelligent Information Systems, Vol. 24, No. 2-3, pp. 179-198, March 2005.
[10] J.A. Bachorowski, M.J. Owren, "Acoustic correlates of talker sex and individual talker identity are present in a short vowel segment produced in running speech", Journal of the Acoustical Society of America, Vol. 106, No. 2, pp. 1054–1063, Aug. 1999.
[11] A. Cherif, L. Bouafif, T. Dabbabi, "Pitch detection and formants analysis of arabic speech processing", Applied Acoustcs, Vol. 62, No. 10, pp. 1129–1140, Oct. 2001.
[12] A.M. Noll, "Cepstrum pitch determination", Journal of the Acoustical Society of America, Vol. 41, pp. 293-309, 1967.
[13] W. Yutai, L. Bo, J. Xiaoqing, L. Feng, W. Lihao, "Speaker recognition based on dynamic MFCC parameters", Proceeding of the IEEE/IASP, pp. 406-409, April 2009.
[14] S. Chougule, P.P. Rege, "Language independent speaker identification", Proceeding of the IEEE/ICIT, pp. 364-368, 15-17 Dec. 2006.
[15] S. Haykin, "Neural networks", Macmillan College Publishing Company, Section 5.3: The Steepest Descent Method, 1994.
[16] M. Katz, "Fractals and the analysis of waveforms", Computers in Biology and Medicine, Vol. 18, No. 3, pp. 145-156, 1988.
[17] J.D. Wu, B.F. Lin, "Speaker identification using discrete wavelet packet transform technique with irregular decomposition", Expert Systems with Applications, Vol. 36, No. 2, pp. 3136–3143, March 2009.
[18] S. Pandiaraj, H.N.R. Keziah, D.S. Vinothini, L. Gloria, "A confidence measure based – score fusion technique to integrate MFCC and Pitch for speaker verification", Proceeding of the IEEE/ICECT, Vol. 3, pp. 317-320, April 2011.