Spoken Language Identification using Wav2Vec2.0 in Indian Languages: A Self-Supervised and Baseline Comparison

Authors

  • Payal Goel Author
  • Shweta Bansal Author

DOI:

https://doi.org/10.64252/rs8c1w40

Keywords:

Spoken Language Identification, Wav2Vec2.0, Self-Supervised Learning, Indian Languages, Transformer, BiLSTM, MFCC, Low-Resource Speech Processing

Abstract

Speech applications need Spoken Language Identification (LID) as their initial processing step when operating across various languages particularly in Indian settings which display both language diversity and limited resources. Traditional LID systems depend on manually designed acoustic features including MFCCs while needing large amounts of labeled training data because they cannot easily process languages with insufficient representation. This research evaluates Wav2Vec2.0 as a self-supervised model which learns from unprocessed audio waveforms to identify spoken languages within ten Indian languages from both Indo-Aryan and Dravidian language families. Wav2Vec2.0 undergoes evaluation through comparison with MFCC-based deep learning approaches that contain RNN, BiLSTM and the hybrid RNN+BiLSTM model structure. The testing accuracy of Wav2Vec2.0 reached 93.7% along with a Word Error Rate of 10.3% when used with a multilingual audio corpus which provided superior performance compared to traditional baselines. The model demonstrates its ability to recognize phonetic details through multiple evaluation tests that include ablation analyses and confusion matrix assessments and language-specific performance metrics. Research demonstrates that Wav2Vec2.0 represents an effective framework which shows potential for developing LID systems that use limited resources in practical multilingual applications.

Downloads

Download data is not yet available.

Downloads

Published

2025-06-24

Issue

Section

Articles

How to Cite

Spoken Language Identification using Wav2Vec2.0 in Indian Languages: A Self-Supervised and Baseline Comparison. (2025). International Journal of Environmental Sciences, 330-361. https://doi.org/10.64252/rs8c1w40