Few-Shot Training of Prototype Networks for Sign Language Recognition

Kalinowski, Michał; Kostek, Bożena

doi:10.15388/26-INFOR632

Informatica

Few-Shot Training of Prototype Networks for Sign Language Recognition

Michał Kalinowski

Bożena Kostek

https://doi.org/10.15388/26-INFOR632

Pub. online: 3 June 2026 Type: Research Article

Open Access

Received
1 March 2026

Accepted
1 May 2026

Published
3 June 2026

Abstract

Limited proficiency in sign language creates communication barriers, motivating the development of robust Automatic Sign Language Recognition (SLR) systems. We address isolated SLR in a low-resource setting using few-shot metric-based meta-learning. Sign videos are encoded with spatiotemporal convolutional backbones and classified using a prototypical network, enabling generalization to unseen classes from small support sets. We compare the SlowFast architecture with state-of-the-art video models on the LSA64 benchmark under strict class-disjoint protocols. SlowFast achieves 94.33% accuracy, outperforming competing backbones and demonstrating an effective and data-efficient approach for low-resource isolated SLR.

References

Ahn, J., Jang, Y., Chung, J.S. (2023). SlowFast Network for Continuous Sign Language Recognition. https://arxiv.org/abs/2309.12304.

Al Abdullah, B.A., Amoudi, G.A., Alghamdi, H.S. (2024). Advancements in sign language recognition: a comprehensive review and future prospects. IEEE Access, 12, 128871–128895. https://doi.org/10.1109/ACCESS.2024.3457692.

Alsulami, A., Bajbaa, K., Laradji, I., Luqman, H. (2024). Few-shot learning for sign language recognition with embedding propagation. Nafath, 9(27).

Bilge, Y.C., Cinbis, R.G., Ikizler-Cinbis, N. (2023). Towards zero-shot sign language recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1), 1217–1232. https://doi.org/10.1109/tpami.2022.3143074.

Boháček, M., Hrúz, M. (2023). Learning from what is already out there: few-shot sign language recognition with online dictionaries. In: 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition, pp. 1–6. https://api.semanticscholar.org/CorpusID:255570058.

Camgoz, N.C., Hadfield, S., Koller, O., Ney, H., Bowden, R. (2018). Neural sign language translation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7784–7793. https://doi.org/10.1109/CVPR.2018.00812.

Camgoz, N.C., Koller, O., Hadfield, S., Bowden, R. (2020). Multi-channel Transformers for Multi-articulatory Sign Language Translation. https://arxiv.org/abs/2009.00299.

Carreira, J., Zisserman, A. (2018). Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. https://arxiv.org/abs/1705.07750.

Chen, H., Wang, J., Guo, Z., Li, J., Zhou, D., Wu, B., Guan, C., Chen, G., Heng, P.-A. (2024). SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning. https://arxiv.org/abs/2401.11847.

Chen, Y., Wei, F., Sun, X., Wu, Z., Lin, S. (2023a). A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation. https://arxiv.org/abs/2203.04287.

Chen, Y., Zuo, R., Wei, F., Wu, Y., Liu, S., Mak, B. (2023b). Two-Stream Network for Sign Language Recognition and Translation. https://arxiv.org/abs/2211.01367.

Cui, R., Liu, H., Zhang, C. (2019). A deep neural framework for continuous sign language recognition by iterative training. IEEE Transactions on Multimedia, 21(7), 1880–1891. https://doi.org/10.1109/TMM.2018.2889563.

de Amorim, C.C., Macêdo, D., Zanchettin, C. (2019). Spatial-Temporal Graph Convolutional Networks for Sign Language Recognition. Springer International Publishing, pp. 646–657. https://doi.org/10.1007/978-3-030-30493-5_59.

DeepSign AI (2026). DeepSign AI. https://deepsignai.com/. Accessed: February 2026.

Desai, A., Meulder, M.D., Hochgesang, J.A., Kocab, A., Lu, A.X. (2024). Systemic Biases in Sign Language AI Research: A Deaf-Led Call to Reevaluate Research Agendas. https://arxiv.org/abs/2403.02563.

Emmorey, K. (2023). Ten things you should know about sign languages. Current Directions in Psychological Science, 32(5), 387-394. https://doi.org/10.1177/09637214231173071.

Fan, H., Li, Y., Xiong, B., Lo, W.-Y., Feichtenhofer, C. (2020). PySlowFast. https://github.com/facebookresearch/slowfast.

Feichtenhofer, C., Fan, H., Malik, J., He, K. (2019). SlowFast Networks for Video Recognition. https://arxiv.org/abs/1812.03982.

Ferreira, S., Costa, E., Dahia, M., Rocha, J. (2022). A Transformer-Based Contrastive Learning Approach for Few-Shot Sign Language Recognition. https://arxiv.org/abs/2204.02803.

Han, X., Lu, F., Yin, J., Tian, G., Liu, J. (2022). Sign Language Recognition Based on R(2+1)D With Spatial–Temporal–Channel Attention. IEEE Transactions on Human-Machine Systems, 52(4), 687–698. https://doi.org/10.1109/THMS.2022.3144000.

Hand Talk (2026). Artificial intelligence for sign language translation. https://www.handtalk.me/en/. Accessed: February 2026.

Hanke, T. (2004). HamNoSys – Representing Sign Language Data in Language Resources and Language Processing Contexts. https://api.semanticscholar.org/CorpusID:15434469.

Hassan, A., Elgabry, A., Hemayed, E. (2021). Enhanced dynamic sign language recognition using SlowFast networks. In: 2021 17th International Computer Engineering Conference (ICENCO), pp. 124–128. https://doi.org/10.1109/ICENCO49852.2021.9698904.

Jiang, S., Sun, B., Wang, L., Bai, Y., Li, K., Fu, Y. (2021). Skeleton Aware Multi-Modal Sign Language Recognition. https://arxiv.org/abs/2103.08833.

Kalinowski, M., Kostek, B. (2026a). Few-shot isolated sign language recognition with spatiotemporal SlowFast prototypes. In: Proceedings of the International Conference on Artificial Intelligence and Soft Computing (ICAISC), Zakopane, Poland.

Kalinowski, M., Kostek, B. (2026b). Machine learning in sign language: a comprehensive analysis and trend survey. Computer Science Review, 60, 100895. https://api.semanticscholar.org/CorpusID:284849780.

Kennaway, R. (2001). Synthetic animation of deaf signing gestures. In: Gesture Workshop. https://api.semanticscholar.org/CorpusID:8191959.

Koller, O., Forster, J., Ney, H. (2015). Continuous sign language recognition: towards large vocabulary statistical recognition systems handling multiple signers. Computer Vision and Image Understanding, 141, 108–125. https://doi.org/10.1016/j.cviu.2015.09.013.

Kong, A.P.H. (2016). Multi-linear Transcription and Analysis of Oral Discourse. In: Analysis of Neurogenic Disordered Discourse Production: From Theory to Practice. Routledge. 978-1-315-63937-6. https://doi.org/10.4324/9781315639376.

Lingvano (2026). Lingvano. https://app.lingvano.com/. Accessed: February 22, 2026.

Liu, Z., Pang, L., Qi, X. (2024). MEN: mutual enhancement networks for sign language recognition and education. IEEE Transactions on Neural Networks and Learning Systems, 35(1), 311–325. https://doi.org/10.1109/TNNLS.2022.3174031.

Lu, H., Salah, A.A., Poppe, R. (2024). TCNet: Continuous Sign Language Recognition from Trajectories and Correlated Regions. https://arxiv.org/abs/2403.11818.

Łacheta, J., Czajkowska-Kisil, M., Linde-Usiekniewicz, J., Rutkowski, P. (Eds.) (2016). Korpusowy Słownik Polskiego Języka Migowego [Corpus Dictionary of Polish Sign Language]. Wydział Polonistyki Uniwersytetu Warszawskiego [Faculty of Polish Studies, University of Warsaw], Warszawa, Poland. 978-83-64111-49-5. https://www.slownikpjm.uw.edu.pl/.

McInnes, L., Healy, J., Melville, J. (2020). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. https://arxiv.org/abs/1802.03426.

Migam (2025). Omnichannel communication for the Deaf. https://migam.org/. Accessed: 2026-02-22.

Min, Y., Hao, A., Chai, X., Chen, X. (2021). Visual Alignment Constraint for Continuous Sign Language Recognition. https://arxiv.org/abs/2104.02330.

Moryossef, A., Tsochantaridis, I., Dinn, J., Camgöz, N.C., Bowden, R., Jiang, T., Rios, A., Müller, M., Ebling, S. (2021). Evaluating the immediate applicability of pose estimation for sign language recognition. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 3429–3435. https://doi.org/10.1109/CVPRW53098.2021.00382.

Papadimitriou, K., Potamianos, G. (2023). Sign language recognition via deformable 3D convolutions and modulated graph convolutional networks. In: ICASSP 2023 – 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. https://doi.org/10.1109/ICASSP49357.2023.10096714.

Papastratis, I., Dimitropoulos, K., Konstantinidis, D., Daras, P. (2020). Continuous sign language recognition through cross-modal alignment of video and text embeddings in a joint-latent space. IEEE Access, 8, 91170–91180. https://doi.org/10.1109/ACCESS.2020.2993650.

Parnami, A., Lee, M. (2022). Learning from Few Examples: A Summary of Approaches to Few-Shot Learning. https://arxiv.org/abs/2203.04291.

Rastgoo, R., Kiani, K., Escalera, S. (2021). Sign language recognition: a deep survey. Expert Systems with Applications, 164, 113794. https://doi.org/10.1016/j.eswa.2020.113794. https://www.sciencedirect.com/science/article/pii/S095741742030614X.

Ronchetti, F., Quiroga, F.M., Estrebou, C., Lanzarini, L., Rosete, A. (2023). LSA64: An Argentinian Sign Language Dataset. https://arxiv.org/abs/2310.17429.

Sari, I.P., Mumtas, F., Fauzan Putra, Z.E.F., Sari, R.D., Zaidiah, A., Snatoni, M.M. (2023). Enhanced few-shot learning for Indonesian sign language with prototypical networks approach. In: 2023 International Conference on Informatics, Multimedia, Cyber and Informations System (ICIMCIS), pp. 278–283. https://doi.org/10.1109/ICIMCIS60089.2023.10349031.

Shen, X., Zheng, Z., Yang, Y. (2024). StepNet: spatial-temporal part-aware network for isolated sign language Recognition. ACM Transactions on Multimedia Computing, Communications, and Applications, 20(7), 1–19. https://doi.org/10.1145/3656046.

SignAll (2026). A communication bridge between d/Deaf and hearing. https://futureofinterface.org/signall/. Accessed: February 2026.

Signapse AI (2024). Generative AI for sign language announcements. https://www.signapse.ai/. Accessed: 2026-02-22.

SLAIT School (2024). SLAIT School: Real-time ASL Learning Platform. Accessed: February 22, 2026. https://slait.school/.

Snell, J., Swersky, K., Zemel, R.S. (2017). Prototypical Networks for Few-shot Learning. https://arxiv.org/abs/1703.05175.

Sutton, V. (1995). Lessons in SignWriting: Textbook and Workbook. The Deaf Action Committee for SignWriting and the Center for Sutton Movement Writing, Inc., La Jolla, CA.

Sutton, V., Slevinski, S., Duell, T. (2004). The SignBank Web Site: SignWriting Software. https://www.signbank.org/. Accessed: 2026-02-22. Includes SignPuddle Online (est. 2004) and SignBank FileMaker Pro databases.

Tang, G.W.L., Brentari, D., González, C., Sze, F.Y.B. (2010). Crosslinguistic variation in prosodic cues. In: Sign Languages. https://api.semanticscholar.org/CorpusID:61004529.

Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M. (2018). A Closer Look at Spatiotemporal Convolutions for Action Recognition. https://arxiv.org/abs/1711.11248.

Xie, S., Sun, C., Huang, J., Tu, Z., Murphy, K. (2018). Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification. https://arxiv.org/abs/1712.04851.

Zhou, H., Zhou, W., Qi, W., Pu, J., Li, H. (2021a). Improving Sign Language Translation with Monolingual Data by Sign Back-Translation. https://arxiv.org/abs/2105.12397.

Zhou, Z., Tam, V.W.L., Lam, E.Y. (2022). A cross-attention BERT-based framework for continuous sign language recognition. IEEE Signal Processing Letters, 29, 1818–1822. https://doi.org/10.1109/LSP.2022.3199665.

Zhou, Z., Lui, K.-S., Tam, V.W.L., Lam, E.Y. (2021b). Applying (3+2+1)D residual neural network with frame selection for Hong Kong sign language recognition. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 4296–4302. https://doi.org/10.1109/ICPR48806.2021.9412075.

Biographies

Kalinowski Michał

https://orcid.org/0009-0003-8299-7704

michal.kalinowski@pg.edu.pl

M. Kalinowski is a PhD candidate at the Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology (Gdansk Tech). He completed his studies in 2023 at the Faculty of Electronics, Telecommunications, and Informatics, Gdansk Tech, specializing in artificial intelligence. During his studies, he took part in a research project that was recognized with the Dean’s Award. Currently, he is conducting research on sign language processing using deep learning methods. A particular area of interest is sign language recognition and representation learning using multimodal approaches. His broader interests include generative models and agentic AI systems.

Kostek Bożena

https://orcid.org/0000-0001-6288-2908

bozena.kostek@pg.edu.pl

B. Kostek is a professor at the Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Poland. She is a full member of the Polish Academy of Sciences and a fellow of the Audio Engineering Society and the Acoustical Society of America. Her primary scientific interests include signal processing, psychoacoustics, multimedia, music information retrieval, cognitive and behavioural processing, as well as the applications of machine learning to these domains. She is the recipient of many prestigious research awards, including those of the Prime Minister of Poland (twice), the Ministry of Science, and the Polish Academy of Sciences. She was the editor-in-chief of the Journal of the Audio Engineering Society, as well as Associate Editor of IEEE/ACM TASLP and Guest Editor of JASA, JIIS, and JAES.

Full article

Open access article under the CC BY license.

Keywords

Isolated Sign Language Recognition few-shot learning SlowFast prototype networks UMAP

Metrics

since January 2020

721

Article info
views

123

Full article
views

123

PDF
downloads

XML
downloads

RSS

Authors

Abstract

References

Biographies

Export citation

Copy and paste formatted citation

Download citation in file