Viseme Recognition System Based on Transformed Acoustic Models
AbstractViseme recognition from speech is one of the methods needed to operate a talking head system, which can be used in various areas, such as mobile services and applications, gaming, the entertainment industry, and so on. This paper proposes a novel method for generating acoustic models for viseme recognition from speech. The viseme acoustic models were generated using transformations from trained phoneme acoustic models. The proposed transformation method is language-independent; only the available speech resources are needed. The viseme sequence with corresponding time information was produced as a result of recognition using context-dependent acoustic models. The evaluation of the proposed acoustic models’ transformation method was carried out on a test scenario with phonetically balanced words, in which the results were compared to the baseline viseme recognition system. The improvement in viseme accuracy was statistically significant when using the proposed method for transforming acoustic models.
Authors retain copyright and grant the journal the right of the first publication with the paper simultaneously licensed under the Creative Commons Attribution 4.0 (CC BY 4.0) licence.
Authors are allowed to enter into separate, additional contractual arrangements for the non-exclusive distribution of the paper published in the journal with an acknowledgement of the initial publication in the journal.
Copyright terms are indicated in the Republic of Lithuania Law on Copyright and Related Rights, Articles 4-37.