Sign Language Translator Using Transformer Model

Authors

  • Gali Ravi Kiran Author
  • Varada Srinadh Author
  • V. Ankitha Author
  • S. Surekha Author

DOI:

https://doi.org/10.64252/9j23fm84

Keywords:

SignLanguageRecognition (SLR),DeepLearning,ConvolutionalNeuralNetworks(CNNs),LongShort-Term Memory (LSTM)networks, Spatiotemporaldependencies, Transformer-based framework, Self-attention mechanism, Raw image frames, MediaPipe-extracted skeletal keypoints, Self-supervised learning

Abstract

Sign Language Recognition (SLR) is important to facilitate communication bridges between deaf and hearing populations. Traditional CNN and LSTM models fail to handle spatiotemporal complexities, particularly in continuous sign language, but we introduce a Transformer-based dual-stream approach with self-attention to extract spatial and temporal relationships. Our method handles raw video frames and Media Pipe-sourced skeletal key points with self-supervised masked feature prediction and contrastive learning for enhanced generalization under low-resource environments. Motivated by SLGT former, we incorporate hierarchical attention layers to learn about fine gesture subtleties. Tested on the ISL-CSLTR dataset, our model performs better than CNN-LSTM and state-of-the-art SLR baselines on both isolated and continuous gesture recognition and generalized well to out-of-distribution signs with small amounts of labelled data. This research pushes forward real-time, accessible AI for scalable and the most useful sign languageĀ translation.

Downloads

Download data is not yet available.

Downloads

Published

2025-07-17

Issue

Section

Articles

How to Cite

Sign Language Translator Using Transformer Model. (2025). International Journal of Environmental Sciences, 3285-3298. https://doi.org/10.64252/9j23fm84