Audio - Driven Prediction Of Child Language Proficiency: A Hybrid Transformer-Lightgbm Ensemble Framework For Automated Evaluation

Latha N.R.; Pallavi G. B.; Shyamala G.; Cherukupalli Yashwitha Reddy

doi:10.64252/w4y45395

Authors

Latha N.R. Author
Pallavi G. B. Author
Shyamala G. Author
Cherukupalli Yashwitha Reddy Author

DOI:

https://doi.org/10.64252/w4y45395

Keywords:

Child ESL, Speech-to-text, Transformer, Wave2Vec, Whisper Model

Abstract

This paper presents a novel approach for automated assessment of child English as a Sec-ond Language (ESL) proficiency using transcript based analysis from speech recordings. Leveraging raw audio data of 5000 files, the proposed pipeline first extracts transcript-based linguistic features using Whisper, along with prosodic and acoustic characteristics using Wave2VeC. The pipeline combines features with transformer embeddings, evaluated via a hybrid Transformer and LightGBM model. Experimental results demonstrate strong performance, with Accuracy: 0.972, and Pearson correlation: 0.98, outperforming baseline machine learning approaches. Comparative analysis with state-of-the-art methods, including ASR-driven GPT classifiers, highlights the advantages of the proposed offline, cost-efficient pipeline while maintaining high predictive fidelity. The system further supports real-time user feedback by analyzing key linguistic and syntactic indicators, enabling practical applications in educational and language learning environments. Overall, this study demonstrates the effectiveness of combining speech-driven embeddings with ensemble machine learning for precise, scalable child ESL proficiency assessment.

Downloads

Download data is not yet available.

Audio - Driven Prediction Of Child Language Proficiency: A Hybrid Transformer-Lightgbm Ensemble Framework For Automated Evaluation

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission

Indexing

Language