Audio - Driven Prediction Of Child Language Proficiency: A Hybrid Transformer-Lightgbm Ensemble Framework For Automated Evaluation

Authors

  • Latha N.R. Author
  • Pallavi G. B. Author
  • Shyamala G. Author
  • Cherukupalli Yashwitha Reddy Author

DOI:

https://doi.org/10.64252/w4y45395

Keywords:

Child ESL, Speech-to-text, Transformer, Wave2Vec, Whisper Model

Abstract

This paper presents a novel approach for automated assessment of child English as a Sec-ond Language (ESL) proficiency using transcript based analysis from speech recordings. Leveraging raw audio data of 5000 files, the proposed pipeline first extracts transcript-based linguistic features using Whisper, along with prosodic and acoustic characteristics using Wave2VeC. The pipeline combines features with transformer embeddings, evaluated via a hybrid Transformer and LightGBM model. Experimental results demonstrate strong performance, with Accuracy: 0.972, and Pearson correlation: 0.98, outperforming baseline machine learning approaches. Comparative analysis with state-of-the-art methods, including ASR-driven GPT classifiers, highlights the advantages of the proposed offline, cost-efficient pipeline while maintaining high predictive fidelity. The system further supports real-time user feedback by analyzing key linguistic and syntactic indicators, enabling practical applications in educational and language learning environments.  Overall, this  study demonstrates the effectiveness of combining speech-driven embeddings with ensemble machine learning for precise, scalable child ESL proficiency assessment.

Downloads

Download data is not yet available.

Downloads

Published

2025-09-08

Issue

Section

Articles

How to Cite

Audio - Driven Prediction Of Child Language Proficiency: A Hybrid Transformer-Lightgbm Ensemble Framework For Automated Evaluation. (2025). International Journal of Environmental Sciences, 1529-1536. https://doi.org/10.64252/w4y45395