Integrated Statistical and AI Models for Early Warning Systems in Environmental Toxicology: A Case Study on Waterborne Heavy Metal Contamination
DOI:
https://doi.org/10.64252/82p6h343Keywords:
Environmental toxicology, heavy metals, machine learning, early warning system, Random Forest, water quality prediction, PCA, feature importance, GIS-based monitoring.Abstract
Waterborne heavy metal contamination poses a critical threat to both ecosystem health and human well-being, particularly in developing regions lacking robust environmental monitoring infrastructure. This study presents an integrated approach combining traditional statistical analysis with artificial intelligence (AI) models to develop an early warning system (EWS) for detecting and predicting heavy metal pollution in surface waters. Water samples were collected across 20 locations over three seasons and analyzed for key physico-chemical parameters (pH, EC, DO, TDS, turbidity, temperature) and six priority heavy metals (As, Cd, Pb, Cr, Hg, Ni). Multivariate statistical tools, including Principal Component Analysis (PCA), were used to identify pollution gradients and key influencing factors. Machine learning models, Random Forest, Support Vector Regression, Gradient Boosting, and Artificial Neural Networks were trained to forecast heavy metal concentrations using environmental variables. Random Forest showed the best performance with an R² of 0.91 and the lowest RMSE, highlighting its predictive reliability. Feature importance analysis revealed TDS, EC, and turbidity as the strongest predictors of contamination. The results were used to build a GIS-compatible early warning framework capable of classifying contamination risk zones in near real-time. This study offers a replicable and scalable model for predictive toxicology and environmental management, enabling data-driven, preemptive responses to contamination events.