Sign Language Translator Using Transformer Model

Gali Ravi Kiran; Varada Srinadh; V. Ankitha; S. Surekha

doi:10.64252/9j23fm84

Authors

Gali Ravi Kiran Author
Varada Srinadh Author
V. Ankitha Author
S. Surekha Author

DOI:

https://doi.org/10.64252/9j23fm84

Keywords:

SignLanguageRecognition (SLR),DeepLearning,ConvolutionalNeuralNetworks(CNNs),LongShort-Term Memory (LSTM)networks, Spatiotemporaldependencies, Transformer-based framework, Self-attention mechanism, Raw image frames, MediaPipe-extracted skeletal keypoints, Self-supervised learning

Abstract

Sign Language Recognition (SLR) is important to facilitate communication bridges between deaf and hearing populations. Traditional CNN and LSTM models fail to handle spatiotemporal complexities, particularly in continuous sign language, but we introduce a Transformer-based dual-stream approach with self-attention to extract spatial and temporal relationships. Our method handles raw video frames and Media Pipe-sourced skeletal key points with self-supervised masked feature prediction and contrastive learning for enhanced generalization under low-resource environments. Motivated by SLGT former, we incorporate hierarchical attention layers to learn about fine gesture subtleties. Tested on the ISL-CSLTR dataset, our model performs better than CNN-LSTM and state-of-the-art SLR baselines on both isolated and continuous gesture recognition and generalized well to out-of-distribution signs with small amounts of labelled data. This research pushes forward real-time, accessible AI for scalable and the most useful sign language translation.

Downloads

Download data is not yet available.

Sign Language Translator Using Transformer Model

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission

Indexing

Language