Tree-based Phone Duration Modelling of the Serbian Language

Authors

  • S. Sovilj-Nikic University of Novi Sad
  • V. Delic University of Novi Sad
  • I. Sovilj-Nikic University of Novi Sad
  • M. Markovic University of Novi Sad

DOI:

https://doi.org/10.5755/j01.eee.20.3.4090

Keywords:

Decision trees, machine learning algorithms, speech, speech synthesis

Abstract

Considering the importance of segmental duration from a perceptive point of view, the possibility of automatic prediction of natural duration of phones is essential for achieving the naturalness of synthesized speech. In this paper phone duration prediction model for the Serbian language using tree-based machine learning approach is presented. A large speech corpus and a feature set of 21 parameters describing phones and their contexts were used for segmental duration prediction. Phone duration modelling is based on attributes such as the current segment identity, preceding and following segment types, manner of articulation (for consonants) and voicing of neighbouring phones, lexical stress, part-of-speech, word length, the position of the segment in the syllable, the position of the syllable in a word, the position of a word in a phrase, phrase break level, etc. These features have been extracted from the large speech database for the Serbian language. The results obtained for the full phoneme set using regression tree, RMSE (root-mean-squared-error) 14.8914 ms, MAE (mean absolute error) 11.1947 ms and correlation coefficient 0.8796 are comparable with those reported in the literature for Czech, Greek, Lithuanian, Korean, Indian languages Hindi and Telugu, Turkish.

DOI: http://dx.doi.org/10.5755/j01.eee.20.3.4090

Downloads

Published

2014-03-04

How to Cite

Sovilj-Nikic, S., Delic, V., Sovilj-Nikic, I., & Markovic, M. (2014). Tree-based Phone Duration Modelling of the Serbian Language. Elektronika Ir Elektrotechnika, 20(3), 77-82. https://doi.org/10.5755/j01.eee.20.3.4090

Issue

Section

SIGNAL TECHNOLOGY