Evaluation of Lithuanian Speech-to-Text Transcribers
Pub. online: 16 April 2025
Type: Research Article
Open Access
Received
1 July 2024
1 July 2024
Accepted
1 April 2025
1 April 2025
Published
16 April 2025
16 April 2025
Abstract
For more than two decades, Lithuanian speech recognition has been researched solely in Lithuania due to the need for deep knowledge of Lithuanian. AI advancements now allow high-quality speech-to-text systems to be built without native knowledge, given sufficient annotated data is available. This study evaluated as many as 18 Lithuanian speech transcribers using a small piece of recording; 7 best ones were selected and evaluated using extensive data. The top system achieved a WER of 5.1% for Lithuanian words, with three others showing 8.7–9.2%. For other word-size tokens, such as numbers, speech disfluencies, abbreviations, foreign words, a classification adapted to the Lithuanian language was proposed. Different processing strategies for tokens of these classes were examined and it was assessed which transcribers tend to follow which strategies.
References
Cumbal, R., Moell, B., Lopes, J., Engwall, O. (2021). “You don’t understand me!”: comparing ASR results for L1 and L2 speakers of Swedish. In: INTERSPEECH 2021, pp. 4463–4467. https://doi.org/10.21437/Interspeech.2021-2140.
Errattahi, R., El Hannani, A., Ouahmane, H. (2018). Automatic speech recognition errors detection and correction: a review. Procedia Computer Science, 128, 32–37. https://doi.org/10.1016/j.procs.2018.03.005.
Fadel, W., Toumi, B., Buvet, P.-A., Bourja, O. (2023). Adapting off-the-shelf speech recognition systems for novel words. Information (Switzerland), 14, 179. https://doi.org/10.3390/info14030179.
Georgila, K., Leuski, A., Yanov, V., Traum, D. (2020). Evaluation of off-the-shelf speech recognizers across diverse dialogue domains. In: Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 6469–6476. https://aclanthology.org/2020.lrec-1.797/.
Hui Jae, Y., Oh, E.-B., Kim, J.-M. (2023). Comparison of automatic speech recognition system for school-aged children’s narratives: naver clova speech and google speech-to-text. Communication Sciences & Disorders, 28, 30–38. https://doi.org/10.12963/csd.23952.
Iancu, B. (2019). Evaluating google speech-to-text API’s performance for Romanian e-learning resources. Informatica Economica, 23, 17–25. https://doi.org/10.12948/issn14531305/23.1.2019.02.
Kasparaitis, P. (2008). Lithuanian speech recognition using the English recognizer. Informatica, 19(4), 505–516. https://doi.org/10.15388/Informatica.2008.227.
Kobylyukh, L., Rybchak, Z., Basystiuk, O. (2023). Analyzing the accuracy of speech-to-text APIs in transcribing the Ukrainian language. In: CEUR Workshop Proceedings, Vol. 3396, pp. 217–227. https://ceur-ws.org/Vol-3396/paper18.pdf.
Kuligowska, K., Stanusch, M., Koniew, M. (2023). Challenges of automatic speech recognition for medical interviews – research for Polish language. Procedia Computer Science, 225, 1134–1141. https://doi.org/10.1016/j.procs.2023.10.101.
Laurinčiukaitė, S., Telksnys, L., Kasparaitis, P., Kliukienė, R., Paukštytė, V. (2018). Lithuanian speech corpus Liepa for development of human-computer interfaces working in voice recognition and synthesis mode. Informatica, 29(3), 487–498. https://doi.org/10.15388/Informatica.2018.177.
Lipeika, A., Lipeikienė, J., Telksnys, L. (2002). Development of isolated word speech recognition system. Informatica, 13(1), 37–46. https://doi.org/10.3233/INF-2002-13103.
Maskeliunas, R., Rudzionis, A., Ratkevicius, K., Rudzionis, V. (2009). Investigation of foreign languages models for Lithuanian speech recognition. Elektronika ir Elektrotechnika, 91(3), 15–20. https://eejournal.ktu.lt/index.php/elt/article/view/10271.
McCowan, I., Moore, D., Dines, J., Gatica-Perez, D., Flynn, M., Wellner, P., Bourlard, H. (2004). On the Use of Information Retrieval Measures for Speech Recognition Evaluation. IDIAP Research Report 04-73. IDIAP Research Institute. https://publications.idiap.ch/downloads/reports/2004/rr04-73.pdf.
Pipiras, L., Maskeliunas, R., Damaševičius, R. (2019). Lithuanian speech recognition using purely phonetic deep learning. Computers, 8(4), 76. https://doi.org/10.3390/computers8040076.
Rasymas, T., Rudžionis, V. (2014). Combining multiple foreign language speech recognizers by using neural networks. In: Human Language Technologies–The Baltic Perspective, Vol. 268, pp. 33–39. https://doi.org/10.3233/978-1-61499-442-8-33.
Raškinis, G., Raškinienė, D. (2003). Building medium-vocabulary isolated-word Lithuanian HMM speech recognition system. Informatica, 14(1), 75–84. https://doi.org/10.15388/Informatica.2003.005.
Rugayan, J., Salvi, G., Svendsen, T. (2023). Perceptual and task-oriented assessment of a semantic metric for ASR evaluation. In: Proceedings of the INTERSPEECH 2023, pp. 2158–2162. https://doi.org/10.21437/Interspeech.2023-1778.
Sasindran, Z., Yelchuri, H., Rao, S., Prabhakar, T. (2023). ${H_{e}}val$: a new hybrid evaluation metric for automatic speech recognition tasks. https://doi.org/10.48550/arXiv.2211.01722.
Siegert, I., Sinha, Y., Jokisch, O., Wendemuth, A. (2020). Recognition performance of selected speech recognition APIs – a longitudinal study. In: Speech and Computer: 22nd International Conference, SPECOM 2020. Springer-Verlag, pp. 520–529. 978-3-030-60275-8. https://doi.org/10.1007/978-3-030-60276-5_50.
Silber-Varod, V., Siegert, I., Jokisch, O., Sinha, Y., Geri, N. (2021). A cross-language study of speech recognition systems for English, German, and Hebrew. Online Journal of Applied Knowledge Management, 9(1), 1–15. https://doi.org/10.36965/OJAKM.2021.9(1)1-15.
Sipavičius, D., Maskeliunas, R. (2016). “Google” Lithuanian speech recognition efficiency evaluation research. In: Dregvaite, G., Damasevicius, R. (Eds.), Information and Software Technologies. Springer International Publishing, Cham, pp. 602–612. 978-3-319-46253-0. https://doi.org/10.1007/978-3-319-46254-7_49.
Yoo, H., Seo, S., Im, S., Gim, G. (2021). The performance evaluation of continuous speech recognition based on Korean phonological rules of cloud-based speech recognition open API. International Journal of Networked and Distributed Computing, 9(1), 10–18. https://doi.org/10.2991/ijndc.k.201218.005.
Biographies
Kasparaitis Pijus
P. Kasparaitis (born in 1967) graduated from Vilnius University (Faculty of Mathematics) in 1991. In 2001, he defended his PhD thesis “Lithuanian Text-to-Speech Synthesis”. Presently, he is an associate professor at Vilnius University. His current research interests include text-to-speech synthesis, speech recognition, and other areas of computer linguistics.