Multimodal multilingual human-machine speech communication

Many of us talk to digital voice assistants on a daily basis, and we can expect that the extent of our conversation with machines will only increase in the future. On the other hand, visual cues from facial movements (most notably lip movements) have been shown to increase the accuracy of speech recognition, particularly in the presence of noise or other acoustic degradation, and to even increase the intelligibility of synthesized speech. The principal objective of the Project is to develop advanced machine learning algorithms in the field of human-machine audio-visual speech communication.

PARTICIPANTS

The core of the team includes researchers of the Laboratory for Acoustics and Speech technology at the Faculty of Technical Sciences, University of Novi Sad, Serbia. In cooperation with the Serbian company “AlfaNum”, this team has already developed neural network based speech recognition and synthesis for Serbian and kindred languages, as well as speech and language corpora on which these technologies are extremely dependent.

Objectives

The Project focuses on the development of advanced machine learning algorithms in the field of human-machine audio-visual speech communication. The research will rely on two multilingual audio-visual speech corpora that will be developed within the Project (one recorded in controlled conditions and the other based on videos collected from the Internet), and it will be based on state-of-the-art methods in artificial intelligence, including deep learning feature extraction, as well as temporal modelling by recurrent neural networks or temporal convolutional networks. The multilinguality of the developed models refers to the fact that they will be able to combine training data in multiple languages and to recognize and synthesize speech in any of them, owing to network embedding at phonetic level. The models will be evaluated through implementation of audio-visual speech recognition and synthesis into two existing speech technology products for Serbian, but will be applicable to any language. Thus, besides contributing to the preservation of Serbian language in the digital age, the Project will have a clear economic and social impact at the international level, having in mind the increasing use of speech technology in everyday life.