AI-SPEAK

The project team includes the leading researchers of the Faculty of Technical Sciences, who have already achieved excellent results in the research and development of speech technology at the project of technological development TR32035 "The development of dialog systems for Serbian and other South Slavic languages" carried out from 2011 to 2019, with financial support from the Ministry of Education, Science and Technological Development. The project AI-SPEAK addresses paralinguistic aspects of human-machine speech communication, which encompass various speech styles and emotional expressions, with respect to both its auditory as well as visual manifestations.

Milan Sečujski is the principal researcher at the AI-SPEAK project and a leading researcher in computational linguistics and natural language processing with over 160 scientific publications, who has given a key contribution to the development of speech and language resources for Serbian and kindred languages, as well as intonation modelling. He authored the only existing morphological dictionaries of Serbian and Croatian explicitly listing inflected word forms (more than 5 million each), which form the basis for all speech technology applications in these two standard variants of Serbo-Croatian, and developed a prosodic annotation system for them (used for structuring speech data and marking it for prosodic events such as accents, phrase breaks and sentence emphasis). He is also lecturer at the courses “Machine learning 1” and “Machine learning 2” at two BSc study programmes at FTS-UNS, as well as at the course “Speech technology” at the MSc studies of Power Engineering, Electronics and Telecommunications at FTS-UNS.

Vlado Delić is the leader of the Speech Technology Group at the Department of Power, Electronic and Telecommunication Engineering at FTS-UNS, which has been dedicated to the development of speech technologies for over 20 years. He is the principal lecturer at the course “Human-machine speech communication” at the PhD studies at FTS-UNS, and under his mentorship more than 10 PhD theses in this area have been prepared and defended, including the theses of Project team members Milan Sečujski, Branislav Popović and Siniša Suzić.

Branislav Popović is a leading researcher and the youngest PhD at FTS-UNS with more than 80 publications in the area of speech technology, including both acoustic and language modelling (PhD thesis: “Hierarchical clustering of Gaussian mixture models in applications for continuous speech recognition”, under the mentorship of Vlado Delić, 2012). He was also the principal lecturer at the course “Artificial Intelligence” at the Faculty of Information Technology as well as the Faculty of Mathematics and Computer Science of the Alfa BK University in Belgrade, and participates as lecturer at the course “Speech technology” at the MSc studies of Power Engineering, Electronics and Telecommunications at FTS-UNS.

Lidija Krstanović is a researcher with wide expertise in applied mathematics, machine learning and image processing (PhD thesis: “GMMs Similarity Measure Based on Transformation of the Parameter Space”, 2017). Lidija Krstanović is also the principal lecturer at the course “Artificial Intelligence” at the MSc study programme Animation in Engineering at FTS-UNS.

Branko Brkljač is a researcher with wide expertise in machine learning and image processing (PhD thesis: “Pattern recognition with sparse representation of covariance matrices and covariance descriptors”, 2017). He is also the principal lecturer at the course “Computer Vision” at the MSc study programme Power Engineering, Electronics and Telecommunications (study group Signal Processing) at FTS-UNS.

Nikola Simić is a researcher with a wide experience of autoencoding and compression techniques (PhD thesis: “The designing of quantizers in signal compression algorithms”, 2019) as well as digital image processing, as participant at the international HORIZON 2020 project concerned with auditory and visual scene analysis in smart cities (MARVEL).

Nikša Jakovljević is a researcher with wide experience in the area of automatic speech recognition (PhD thesis: “Application of sparse representation to Gaussian mixture models used in automatic speech recognition”, 2014), completed under the mentorship of prof. Vlado Delić, as well as in the area of digital image processing (participation at HORIZON 2020 projects in the area of audiovisual scene analysis in smart cities (MARVEL) as well as biomedical image processing aimed at cancer detection (INCISIVE)).

Siniša Suzić defended his PhD thesis in 2019, entitled “Parametric synthesis of expressive speech”, under the mentorship of Prof. Vlado Delić. His latest research, carried out with associates Prof. Milan Sečujski and Tijana Nosek, has resulted in 3 papers in international journals and a number of international conference papers dealing with speaker/style adaptation in TTS based on neural networks.

Tijana Nosek is a researcher with very wide experience in the area of speech synthesis, particularly based on neural networks and artificial intelligence. Her scientific results include the development of a speech synthesizer capable of flexible changes of the speaking style, speaker identity as well as the language itself. She has defended her PhD thesis, entitled "Expressive multilingual speech synthesizer", in 2023.

Vuk Stanojev is a teaching assistant working on his PhD research in the area of multimodal speech technology, under the mentorship of Prof. Milan Sečujski.

Team