Multi-resolution Feature Extraction Algorithm in Emotional Speech Recognition
In this paper a new approach for recognizing emotional speech from audio recordings is presented. In order to obtain the optimum processing window width for feature extraction and to achieve the highest level of recognition rates, a trade-off between time and frequency resolution must be made. At this point, we define a new procedure that combines the advantages of narrower and wider windows and takes advantage of dynamic adjustment of the time and frequency resolution of individual feature characteristics. To achieve higher recognition rates two major procedures are added to the multi-resolution feature-extraction concept, one being the exclusion of features calculated on different processing window widths and the other the idea to use only the parts of recordings with most explicit emotions. To confirm the benefits of the algorithm the audio recordings from the emotional speech database Interface along with four different classifiers were used in evaluation. The highest level of emotion recognition rate with multi-resolution approach exceeded the recognition rate of the best single-resolution approach by 3.5 % with the average improvement of 1.5 % in absolute terms.
How to Cite
The copyright for the paper in this journal is retained by the author(s) with the first publication right granted to the journal. The authors agree to the Creative Commons Attribution 4.0 (CC BY 4.0) agreement under which the paper in the Journal is licensed.
By virtue of their appearance in this open access journal, papers are free to use with proper attribution in educational and other non-commercial settings with an acknowledgement of the initial publication in the journal.