Persian Speech Recognition Through the Combination of ANN/HMM

Khosravani pour, Ladan; Farrokhi, Ali

doi:10.30495/SPRE.2023.1056041

Manuscript ID : SPRE-2210-1201 (R3) Visit : 414 Page: 47 - 68

10.30495/SPRE.2023.1056041

Article Type: Original Research

Persian Speech Recognition Through the Combination of ANN/HMM

Subject Areas : Signal Processing; Image Processing

Ladan Khosravani pour ¹ , Ali Farrokhi ^{2
*}

1 - Department of Electrical Engineering, South Tehran Branch, Islamic Azad University, Tehran, Iran.
2 - Department of Electrical Engineering, Islamic Azad University, South Tehran Branch, Tehran, Iran

Received: 2022-10-09 Accepted : 2023-02-02 Published : 2023-06-01

Keywords: Artificial Neural Networks, Discrete Fourier transform, Linear predictive coding, Viterbi Algorithm, recurrent neural networks, hidden Markov models, probabilistic neural networks, Vector Digitizer, Fuzzy Expectation Maximization,

Abstract :

The goal is to create a speech recognition system that is able to recognize Persian speech. Pro-sodic speech is attributed to the hierarchical structure from speech rhythm and tonal expression to the smallest syllable components and provides important information about trans segmental features such as F0 (fundamental frequency), intensity, and duration, which are crucial for natu-ral sound. Prosodic features are highly language dependent, however, the relationship between linguistic features and prosodic data is not well understood in some languages. While relatively high-performance prosodic generators have been developed for many languages, very limited work has been done on prosodic generators in Farsi. In this article, we first use a simple four-layer RNN to extract prosodic information, then we investigate the hybrid ANN/HMM model for Persian speech recognition. 210 samples of the speech of a male person were collected and after removing the noise, 47 of the samples were manually labeled phonetically. Then, the remaining training samples were automatically labeled and new neural networks (ANN) were created for the final recognition of the three-layer MLP type. Four methods including MEL, MEL derivative, energy, and energy derivative were used to extract features, and the values of each of these four methods were combined and given to the neural network. Then we use the neural network to classify these feature vectors and get the most similar vowels. We give the order of vowels as "observations" to HMMs (which are created based on pronunciations) and then find the most probable HMM (or in other words, the most words) to the input sound and output it. By applying recognition on 99.4% of test data, we even reached 100% accuracy in one case, which is a very favorable result considering the small number of speech data

References:

Entropy-based Kernel Graph Cut with Weighted K-Means for Textural Image Region Segmentation
Print Date : 2023-09-01
Development of a Novel Method for Predicting Root Canals Working Length by Analyzing Dental Radiographs
Print Date : 2023-06-01
Indoor Vehicular Navigation using IMU and LiDAR with EKF Parameters Optimization using Grey Wolf Algorithm
Print Date : 2023-03-01
Image Mosaicing based on Adaptive Sample Consensus Method and a Data-Dependent Blending Algorithm
Print Date : 2022-09-01
A Denoising Autoencoder Stacked Deep Learning Method for Clinical Trial Enrichment and Design Applied to Alzheimer’s Disease
Print Date : 2023-03-01
Ship Tracking Utilizing Propeller Noise with a Compact Hydro-phone Array
Print Date : 2023-12-01

Share To

Article Url

Persian Speech Recognition Through the Combination of ANN/HMM