AI-SPEAK

"Speech synthesis in Serbian - a comparison of end-to-end models and models using language processing", YU INFO, March 10-13 2024, Kopaonik, Serbia (link)

"Automatic speech recognition in Serbian - a comparison of a Whisper-based system and a conventional system using language modelling", YU INFO, March 10-13 2024, Kopaonik, Serbia (link)

"Whispered Speech Recognition Based on Audio Data Augmentation and Inverse Filtering", Applied Sciences - Basel, 2024, ISSN: 2076-3417, Special Issue "Speech Recognition and Natural Language Processing", Asad Abdi, Farid Meziane (Eds.), Vol. 14, No. 18: 8223, pp. 1-20, DOI: 10.3390/app14188223, MDPI (link)

"Basic Computational Geometry Applications in Computer Graphics", 9th Conference on Mathematics in Engineering: Theory and Applications, Novi Sad, May 31st-June 2nd, 2024 (link)

„Razvoj govornih asistenata i njihove primene u pametnim kućama i pametnim gradovima“, Zbornik radova 2022/2023 „Serija stručnih predavanja posvećenih unapređenju projektovanja telekomunikacionih mreža i sistema“, Ed. Mirjana Jarić-Ćirić, FTTH udruženje Srbija, Beograd, pp. 202-213, 2024 (link)

"Multimodal Emotion Recognition Using Compressed Graph Neural Networks". In Proc. Speech and Computer (SPECOM 2024), Belgrade, Serbia, November 25-27, 2024. Eds: Alexey Karpov, Vlado Delić, pp. 109-121, Springer (link)

"End-to-End Speech Synthesis for the Serbian Language Based on Tacotron", In Proc. Speech and Computer (SPECOM 2024), Belgrade, Serbia, November 25-27, 2024. Eds: Alexey Karpov, Vlado Delić, pp. 219-229, Springer (link)

"Retrospective and Perspectives of TTS & STT Technology Development and Implementation for South Slavic Under-Resourced Languages", In Proc. Speech and Computer (SPECOM 2024), Belgrade, Serbia, November 25-27, 2024. Eds: Alexey Karpov, Vlado Delić, pp. 23-44, Springer (link)

"Probability Density Function Distance-Based Augmented CycleGAN for Image Domain Translation with Asymmetric Sample Size", In Mathematics 2025, Multidisciplinary Digital Publishing Institute (MDPI), ISSN: 2227-7390, Vol. 13, No. 9: 1406. (link)

"Transforming Faces Into Video Stories-VideoFace2.0", Proc. 14th Mediterranean Conference on Embedded Computing (MECO), pp. 251-254, Budva, Montenegro, ISBN 979-8-3315-1341-2, 2025. (link)

"Person detection and re-identification in open-world settings of retail stores and public spaces", Proc. 2nd Int. Sci. Conf. ALFATECH – Smart Cities and Modern Technologies 2025, Belgrade, Serbia, 2025. (link)

"Named Entity Recognition for Serbian Legal Documents: Design, Methodology and Dataset Development", Proc. 5th International Conference on Information Society and Technology (ICIST), Springer Lecture Notes in Networks and Systems, Kopaonik, Serbia, 2025. (link)

"Exploiting voice conversion in creating new TTS voices", Proc. 32nd International Conference on Systems, Signals and Image Processing (IWSSIP), Skopje, North Macedonia, 2025. (link)

"Recording of a bilingual corpus AI-SPEAK for multimodal speech recognition" (In Serbian: "Snimanje bilingvalne baze AI-SPEAK za multimodalno prepoznavanje govora"), Proc. 31th National Information-Communication Technology Conference (YUinfo) 2025, Kopaonik, Serbia, 2025. (link)

"Influence of training data augmenatations to the performance of finetuned Whisper model" (In Serbian: "Uticaj augmentacija trening podataka na performanse doobučenog Whisper modela"), Proc. 31th National Information-Communication Technology Conference (YUinfo) 2025, Kopaonik, Serbia, 2025. (link)

D1.2

Implementation plan (link)

Quality control plan (link)

Dissemination plan (link)

D2.1

AI-SPEAK Speech Corpus (link)

D2.2

Report on VideoBase: AI-SPEAK Video Internet Database (link)

M1.1

Kick-off meeting report (link)

M2.1

Project meeting report (link)

M2.2

Project meeting report (link)

M3.1

Project meeting report (link)

Publications