Neural Kalman Filter Application to Speech Enhancement
Subject Areas : Electrical Engineering
1 - : Department of Computer and Electrical Engineering, Has.C.,Islamic Azad University, Hashtgerd, Iran.
Keywords: Speech Enhancement, Extended Kalman Filter, Neural Network.,
Abstract :
Speech enhancement is introduced as a requirement to increase the quality of communication systems' operations. There is a broad range of improvements for speech recognition systems in aviation, military, telecommunication, and cellular environments. Also, speech quality confirmation can be important in decrement of audience boredom in noisy environments. In this paper, speech quality enhancement and its intelligibility by extended Kalman filter is introduced. It is obvious that for speech detection the right estimation is needed, but an important subject is that the linear filter is unable to estimate nonlinear systems while most real systems like voice systems have nonlinear architecture. Hence, according to the extended Kalman filter method, modelling and estimation of voice signal with nonlinearity assumption that leads to speech enhancement, is executed and its results are shown.
Since 1960 the Kalman filter was introduced, always researchers tried to use this filter for linear systems dynamics estimation, however, the main problem was that most real systems have nonlinear architecture while the linear Kalman filter is unable to estimate those systems. For instance, voice systems are nonlinear systems that because of voice nonlinearity architecture and linear Kalman filter restriction could not be estimated until an extended Kalman filter was introduced.
[1] R. Martin, "Noise power spectral density estimation based on optimal smoothing and minimum statistics," in IEEE Transactions on Speech and Audio Processing, vol. 9, no. 5, pp. 504-512, July 2001, doi: 10.1109/89.928915.
[2] I. Cohen and B. Berdugo, "Noise estimation by minima controlled recursive averaging for robust speech enhancement," in IEEE Signal Processing Letters, vol. 9, no. 1, pp. 12-15, Jan. 2002, doi: 10.1109/97.988717.
[3] I. Cohen, "Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging," in IEEE Transactions on Speech and Audio Processing, vol. 11, no. 5, pp. 466-475, Sept. 2003, doi: 10.1109/TSA.2003.811544.
[4] P. C. Loizou, "Speech enhancement based on perceptually motivated bayesian estimators of the magnitude spectrum," in IEEE Transactions on Speech and [5] Audio Processing, vol. 13, no. 5, pp. 857-869, Sept. 2005, doi: 10.1109/TSA.2005.851929.
[6] Y. Ephraim and D. Malah, "Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator," in IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, no. 6, pp. 1109-1121, December 1984, doi: 10.1109/TASSP.1984.1164453.
[7] Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error log-spectral amplitude estimator," in IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 33, no. 2, pp. 443-445, April 1985, doi: 10.1109/TASSP.1985.1164550.
[8] H. G. Hirsch and C. Ehrlicher, "Noise estimation techniques for robust speech recognition," 1995 International Conference on Acoustics, Speech, and Signal Processing, Detroit, MI, USA, 1995, pp. 153-156 vol.1, doi: 10.1109/ICASSP.1995.479387.
[9] R. Jaiswal, "Speech Activity Detection under Adverse Noisy Conditions at Low SNRs," 2021 6th International Conference on Communication and Electronics Systems (ICCES), Coimbatre, India, 2021, pp. 97-101, doi: 10.1109/ICCES51350.2021.9488934.
[10] C. Medina, R. Coelho and L. Zão, "Impulsive Noise Detection for Speech Enhancement in HHT Domain," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 2244-2253, 2021, doi: 10.1109/TASLP.2021.3093392.
[11] R. Kumar and P. V. Subbaiah, "Enhancement of noisy speech using sub-band harmonic regeneration and speech presence uncertainty estimator," 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), Bangalore, India, 2016, pp. 456-460, doi: 10.1109/RTEICT.2016.7807862.
[12] K. N. SunilKumar and Shivashankar, "A review on security and privacy issues in wireless sensor networks," 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), Bangalore, India, 2017, pp. 1979-1984, doi: 10.1109/RTEICT.2017.8256945.
[13] R. Gatti and S. Hivashankar, "Study of different resource allocation scheduling policy in advanced LTE with carrier aggregation," 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), Bangalore, India, 2017, pp. 2257-2261, doi: 10.1109/RTEICT.2017.8257002.
Journal of Applied Dynamic Systems and Control,Vol.8., No.2., 2025: 1-6
| 1 |
Neural Kalman Filter Application to Speech Enhancement
Amir Moghtadaei Rad1*
1* Corresponding Author: Department of Computer and Electrical Engineering, Has.C.,Islamic Azad University, Hashtgerd, Iran. Email: Amir.Moghtadaei@iau.ac.ir
Received: 2025.02.06; Accepted: 2025.05.14
Abstract– Speech enhancement is introduced as a requirement to increase the quality of communication systems' operations. There is a broad range of improvements for speech recognition systems in aviation, military, telecommunication, and cellular environments. Also, speech quality confirmation can be important in decrement of audience boredom in noisy environments. In this paper, speech quality enhancement and its intelligibility by extended Kalman filter is introduced. It is obvious that for speech detection the right estimation is needed, but an important subject is that the linear filter is unable to estimate nonlinear systems while most real systems like voice systems have nonlinear architecture. Hence, according to the extended Kalman filter method, modelling and estimation of voice signal with nonlinearity assumption that leads to speech enhancement, is executed and its results are shown.
Keywords: Speech Enhancement Extended Kalman Filter, Neural Network.
1. Introduction
Since 1960 the Kalman filter was introduced, always researchers tried to use this filter for linear systems dynamics estimation, however, the main problem was that most real systems have nonlinear architecture while the linear Kalman filter is unable to estimate those systems. For instance, voice systems are nonlinear systems that because of voice nonlinearity architecture and linear Kalman filter restriction could not be estimated until an extended Kalman filter was introduced.
This filter creates optimized estimations of nonlinear system states with nonlinear system modelling using a neural network[1-3].
Speech enhancement aims to improve quality and intelligibility. Quality refers to the amount of noise free in speech and intelligibility refers to the percentage number of words understand in the sentence. Speech enhancement involves noise estimation as crucial part. Many researchers represent different ideas for nonlinear system methods and each one has its own advantages and disadvantages, but a few nonlinear methods are introduced for voice enhancement. In this paper, we introduce a simple algorithm to identify nonlinear voice systems by neural network and then with an extended Kalman filter (EKF) estimation algorithm, improve noisy voice signals. Moreover, at the end of the paper; clean, noisy, and enhanced signals are compared. In conclusion, a comparative analysis demonstrates significant improvement in the proposed method.
2. Speech Identification
In the nonlinear system identification field, voice is also comprised, because nonlinear system dynamics are variable at each time and unpredictable certainly, therefore linear system identification methods are not implemented. The only suitable way for this modelling is by applying neural networks.
Because of learning ability, neural networks and fuzzy neural networks are the only methods in nonlinear systems identification and prediction. These networks can parallel by main system and learn system behaviour intelligently after some iteration. Then, they can operate like the main system and they can be replaced with a modelled system. This extra ability makes possibility to model a lot of nonlinear and difficult real systems dynamics with neural networks[4],[5].
2.1 Voice State Space Representation
Figure 1 showsSignalsare the noisy voice production nonlinear system states and
Process noise is the system input;the output is
Which is a noisy signal destroyedby
Measurement noise.
Fig. 1: Data is processed with an AR model and destroyed by added measurement noise.
The system state space representation is as below[6]:
| (1) |
| (2) |
| (3) | |
| (4) |
| (5) |
| (6) |
| (7) |
| (8) |
| (9) |
| (10) |