Evaluation of Lithuanian Speech-to-Text Transcribers

Kasparaitis, Pijus

doi:10.15388/25-INFOR591

Informatica

Evaluation of Lithuanian Speech-to-Text Transcribers

Volume 36, Issue 2 (2025), pp. 369–384

Pijus Kasparaitis

https://doi.org/10.15388/25-INFOR591

Pub. online: 16 April 2025 Type: Research Article

Open Access

Received
1 July 2024

Accepted
1 April 2025

Published
16 April 2025

Abstract

For more than two decades, Lithuanian speech recognition has been researched solely in Lithuania due to the need for deep knowledge of Lithuanian. AI advancements now allow high-quality speech-to-text systems to be built without native knowledge, given sufficient annotated data is available. This study evaluated as many as 18 Lithuanian speech transcribers using a small piece of recording; 7 best ones were selected and evaluated using extensive data. The top system achieved a WER of 5.1% for Lithuanian words, with three others showing 8.7–9.2%. For other word-size tokens, such as numbers, speech disfluencies, abbreviations, foreign words, a classification adapted to the Lithuanian language was proposed. Different processing strategies for tokens of these classes were examined and it was assessed which transcribers tend to follow which strategies.

References

Cumbal, R., Moell, B., Lopes, J., Engwall, O. (2021). “You don’t understand me!”: comparing ASR results for L1 and L2 speakers of Swedish. In: INTERSPEECH 2021, pp. 4463–4467. https://doi.org/10.21437/Interspeech.2021-2140.

Errattahi, R., El Hannani, A., Ouahmane, H. (2018). Automatic speech recognition errors detection and correction: a review. Procedia Computer Science, 128, 32–37. https://doi.org/10.1016/j.procs.2018.03.005.

Fadel, W., Toumi, B., Buvet, P.-A., Bourja, O. (2023). Adapting off-the-shelf speech recognition systems for novel words. Information (Switzerland), 14, 179. https://doi.org/10.3390/info14030179.

Georgila, K., Leuski, A., Yanov, V., Traum, D. (2020). Evaluation of off-the-shelf speech recognizers across diverse dialogue domains. In: Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 6469–6476. https://aclanthology.org/2020.lrec-1.797/.

Hui Jae, Y., Oh, E.-B., Kim, J.-M. (2023). Comparison of automatic speech recognition system for school-aged children’s narratives: naver clova speech and google speech-to-text. Communication Sciences & Disorders, 28, 30–38. https://doi.org/10.12963/csd.23952.

Iancu, B. (2019). Evaluating google speech-to-text API’s performance for Romanian e-learning resources. Informatica Economica, 23, 17–25. https://doi.org/10.12948/issn14531305/23.1.2019.02.

Kasparaitis, P. (2008). Lithuanian speech recognition using the English recognizer. Informatica, 19(4), 505–516. https://doi.org/10.15388/Informatica.2008.227.

Kobylyukh, L., Rybchak, Z., Basystiuk, O. (2023). Analyzing the accuracy of speech-to-text APIs in transcribing the Ukrainian language. In: CEUR Workshop Proceedings, Vol. 3396, pp. 217–227. https://ceur-ws.org/Vol-3396/paper18.pdf.

Kuligowska, K., Stanusch, M., Koniew, M. (2023). Challenges of automatic speech recognition for medical interviews – research for Polish language. Procedia Computer Science, 225, 1134–1141. https://doi.org/10.1016/j.procs.2023.10.101.

Laurinčiukaitė, S., Telksnys, L., Kasparaitis, P., Kliukienė, R., Paukštytė, V. (2018). Lithuanian speech corpus Liepa for development of human-computer interfaces working in voice recognition and synthesis mode. Informatica, 29(3), 487–498. https://doi.org/10.15388/Informatica.2018.177.

Lipeika, A., Lipeikienė, J., Telksnys, L. (2002). Development of isolated word speech recognition system. Informatica, 13(1), 37–46. https://doi.org/10.3233/INF-2002-13103.

Maskeliunas, R., Rudzionis, A., Ratkevicius, K., Rudzionis, V. (2009). Investigation of foreign languages models for Lithuanian speech recognition. Elektronika ir Elektrotechnika, 91(3), 15–20. https://eejournal.ktu.lt/index.php/elt/article/view/10271.

McCowan, I., Moore, D., Dines, J., Gatica-Perez, D., Flynn, M., Wellner, P., Bourlard, H. (2004). On the Use of Information Retrieval Measures for Speech Recognition Evaluation. IDIAP Research Report 04-73. IDIAP Research Institute. https://publications.idiap.ch/downloads/reports/2004/rr04-73.pdf.

Pipiras, L., Maskeliunas, R., Damaševičius, R. (2019). Lithuanian speech recognition using purely phonetic deep learning. Computers, 8(4), 76. https://doi.org/10.3390/computers8040076.

Rasymas, T., Rudžionis, V. (2014). Combining multiple foreign language speech recognizers by using neural networks. In: Human Language Technologies–The Baltic Perspective, Vol. 268, pp. 33–39. https://doi.org/10.3233/978-1-61499-442-8-33.

Raškinis, G., Raškinienė, D. (2003). Building medium-vocabulary isolated-word Lithuanian HMM speech recognition system. Informatica, 14(1), 75–84. https://doi.org/10.15388/Informatica.2003.005.

Rugayan, J., Salvi, G., Svendsen, T. (2023). Perceptual and task-oriented assessment of a semantic metric for ASR evaluation. In: Proceedings of the INTERSPEECH 2023, pp. 2158–2162. https://doi.org/10.21437/Interspeech.2023-1778.

Salimbajevs, A., Kapociute-Dzikiene, J. (2018). General-purpose Lithuanian automatic speech recognition system. In: Proceedings of the 8th International Conference, Baltic HLT, pp. 150–157.

Sasindran, Z., Yelchuri, H., Rao, S., Prabhakar, T. (2023). ${H_{e}}val$: a new hybrid evaluation metric for automatic speech recognition tasks. https://doi.org/10.48550/arXiv.2211.01722.

Siegert, I., Sinha, Y., Jokisch, O., Wendemuth, A. (2020). Recognition performance of selected speech recognition APIs – a longitudinal study. In: Speech and Computer: 22nd International Conference, SPECOM 2020. Springer-Verlag, pp. 520–529. 978-3-030-60275-8. https://doi.org/10.1007/978-3-030-60276-5_50.

Silber-Varod, V., Siegert, I., Jokisch, O., Sinha, Y., Geri, N. (2021). A cross-language study of speech recognition systems for English, German, and Hebrew. Online Journal of Applied Knowledge Management, 9(1), 1–15. https://doi.org/10.36965/OJAKM.2021.9(1)1-15.

Sipavičius, D., Maskeliunas, R. (2016). “Google” Lithuanian speech recognition efficiency evaluation research. In: Dregvaite, G., Damasevicius, R. (Eds.), Information and Software Technologies. Springer International Publishing, Cham, pp. 602–612. 978-3-319-46253-0. https://doi.org/10.1007/978-3-319-46254-7_49.

Yoo, H., Seo, S., Im, S., Gim, G. (2021). The performance evaluation of continuous speech recognition based on Korean phonological rules of cloud-based speech recognition open API. International Journal of Networked and Distributed Computing, 9(1), 10–18. https://doi.org/10.2991/ijndc.k.201218.005.

Biographies

Kasparaitis Pijus

pijus.kasparaitis@mif.vu.lt

P. Kasparaitis (born in 1967) graduated from Vilnius University (Faculty of Mathematics) in 1991. In 2001, he defended his PhD thesis “Lithuanian Text-to-Speech Synthesis”. Presently, he is an associate professor at Vilnius University. His current research interests include text-to-speech synthesis, speech recognition, and other areas of computer linguistics.

Full article Related articles

Open access article under the CC BY license.

Keywords

speech-to-text transcription automatic speech recognition word error rate character error rate Lithuanian

Metrics

since January 2020

321

Article info
views

160

Full article
views

146

PDF
downloads

XML
downloads

RSS

Authors

Abstract

References

Biographies

Export citation

Copy and paste formatted citation

Download citation in file