Sub-band Information Fusion Based on Wavelet Thresholding for Robust Speech Recognition
Subject Areas : Journal of Computer & Robotics
Babak Nasersharif
1
(School of Computer Engineering, Faculty of Engineering, University of Guilan, Rasht, Iran
Audio and Speech Processing Lab, Department of Computer Engineering, Iran university of Science and technology, Tehran, Iran)
Ahamd Akbari
2
(Audio and Speech Processing Lab, Department of Computer Engineering, Iran university of Science and technology, Tehran, Iran)
Keywords: Recognition, Wavelet, Sub-band, Likelihood Combination,
Abstract :
In recent years, sub-band speech recognition has been found useful in addressing the need for robustness in speech recognition, especially for the speech contaminated by band-limited noise. In sub-band speech recognition, the full band speech is divided into several frequency sub-bands, with the result of the recognition task given by the combination of the sub-band feature vectors or their likelihoods as generated by the corresponding sub-band recognizers. In this paper, we draw on the notion of discrete wavelet transform to divide the speech signal into sub-bands. We also make use of the robust features in sub-bands in order to obtain a higher sub-band speech recognition rate. In addition, we propose a likelihood weighting and fusion method based on the wavelet thresholding technique. The experimental results indicate that the proposed weighting methods for likelihood combination and classifiers fusion improve the sub-band speech recognition rate in noisy conditions.
[1] J.B.Allen, How do human process and recognize speech, IEEE Trans. on acoustics, speech and signal processing 2 (4), pp. 567-577, 1994. [2] S. F. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. on acoustics, speech and signal processing 27 (2), pp. 113-120, 1979. [3] C. Cerisara, D.Fohr, Multi-band automatic speech recognition, Computer Speech and Language 15 (2), pp. 151-174, 2001. [4] D. L. Donoho, Denoising by soft thresholding, IEEE Trans. on Information Theory 41 (3), pp. 613-627, 1995. [5] M.J.F.Gales, S.J.Young, Robust continuous speech recognition using parallel model combination, IEEE Trans. on acoustics, speech and signal processing 4 (5), pp. 352-359, 1996. [6] A. Hagen, A. Morris, Recent advances in the multi-stream HMM/ANN hybrid approach to noise robust ASR, Computer Speech and Language 19 (1), pp. 3-30, 2005. [7] H. Hermansky, N. Morgan, RASTA processing of speech, IEEE Trans. on acoustics, speech and signal processing..2 (4), pp. 578-589, 1994. [8] X. Huang, A.Acero, H. Hon, Spoken language processing, Prentice Hall, 2001. [9] S. Ikbal, H. Misra, H. Bourlard,, Phase autocorrelation derived robust speech features, In: Processing of IEEE Int. Conf. on Acoustics, Speech, and Signal processing, 2003.
[10 ] Y. Kessentini, T. Paquet, A. B. Hamadou, Off-line handwritten word recognition using multi-stream hidden Markov models, Pattern Recognition Letters, 31 (1), pp. 60-70, 2010.
[11] C.J. Leggetter, P.C. Woodland, Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models, Computer speech and language 9 (2), pp. 171-185, 1995.
[12] D. Mansour, B.Juang, A family of distortion measure based upon projection operation for robust speech recognition, IEEE Trans. on acoustics, speech and signal processing 37 (11), pp. 1659-1671, 1989. [13] H.A Murthy, V.Gadde, The modified group delay function and its application to phoneme recognition, In: Proc. of IEEE Int. Conf. on Acoustic, Speech, and Signal processing, 2003. [14] B.Nasersharif, A.Akbari, Improved HMM entropy for robust sub-band speech recognition, In: Proc. of 13th European Signal Processing Conferences (EUSIPCO), (2005). [15] B.Nasersharif, A.Akbari, Sub-band weighted projection measure for sub-band speech recognition in noise, IEE Electronics letter. 42, (14), pp. 829-831, 2006. [16] B.Nasersharif, A.Akbari, Application of wavelet transform and wavelet thresholding in robust sub-band speech recognition, In: Proc. of European Signal Processing Conference, 2004. [17] S. Okawa, E. Boochieri, A. Potamianos, Multi-band speech recognition in noisy environment, in: Proceeding of IEEE Int. Conf. on Acoustics, Speech, and Signal processing, 1998. [18] K.Paliwal, L.Alsteris, Usefulness of phase spectrum in human speech perception, In: Proc. of EUROSPEECH, 2003. [19] X. Shao, J. Barker , Stream weight estimation for multistream audio–visual speech recognition in a multispeaker environment ,Speech Communication, 50, (4), pp. 337-353, 2008. [20] K.P.Soman, K.I.Ramachandran, Insight into wavelets: From Theory to Practice, Second Edition, Prentice-Hall of India, 2005. [21] F. Valente, H. Hermansky, Combination of acoustic classifiers based on Dempster–Shafer theory of evidence, In. Proc. ICASSP 2007. [22] F. Valente, Multi-stream speech recognition based on Dempster–Shafer combination rule, Speech Communication, 52, (3), pp. 213-222, 2010.[23] B.Yegnanarayana, H.A Murthy, Significance of group delay functions in spectrum estimation, IEEE Trans. on Acoustic, Speech and signal processing 40 (9), pp. 2281-2289, 1992.
[24] D.Zhu, K Paliwal, Product of power spectrum and group delay function for speech recognition, In: Proceeding IEEE Int. Conf. on Acoustics, Speech, and Signal processing, 1, pp. 125-128, 2004.