Tree-based Phone Duration Modelling of the Serbian Language
Keywords:Decision trees, machine learning algorithms, speech, speech synthesis
AbstractConsidering the importance of segmental duration from a perceptive point of view, the possibility of automatic prediction of natural duration of phones is essential for achieving the naturalness of synthesized speech. In this paper phone duration prediction model for the Serbian language using tree-based machine learning approach is presented. A large speech corpus and a feature set of 21 parameters describing phones and their contexts were used for segmental duration prediction. Phone duration modelling is based on attributes such as the current segment identity, preceding and following segment types, manner of articulation (for consonants) and voicing of neighbouring phones, lexical stress, part-of-speech, word length, the position of the segment in the syllable, the position of the syllable in a word, the position of a word in a phrase, phrase break level, etc. These features have been extracted from the large speech database for the Serbian language. The results obtained for the full phoneme set using regression tree, RMSE (root-mean-squared-error) 14.8914 ms, MAE (mean absolute error) 11.1947 ms and correlation coefficient 0.8796 are comparable with those reported in the literature for Czech, Greek, Lithuanian, Korean, Indian languages Hindi and Telugu, Turkish.
How to Cite
The copyright for the paper in this journal is retained by the author(s) with the first publication right granted to the journal. The authors agree to the Creative Commons Attribution 4.0 (CC BY 4.0) agreement under which the paper in the Journal is licensed.
By virtue of their appearance in this open access journal, papers are free to use with proper attribution in educational and other non-commercial settings with an acknowledgement of the initial publication in the journal.