Stacked Ensemble Learning For Software Defect Prediction: Model Integration And Cross-Dataset Validation

Authors

  • Abhiraj Singh Mohil Author
  • Akshita Mohil Author
  • Meenu Mohil Author

DOI:

https://doi.org/10.64252/9hcqz582

Keywords:

Software defect prediction, Ensemble learning, SMOTE, Feature selection, Cross-Dataset validation

Abstract

Software defect prediction (SDP) is one of the most critical aspects of software quality improvement and efficient use of testing resources. The traditional machine learning models tend to lack both generalizability and performance, especially when faced with imbalanced or small datasets. To overcome these limitations, the current research proposed a stacked ensemble learning model that combines Random Forest, Gradient Boosting and AdaBoost as base learners, and logistic regression as meta-learner. A selected collection of 500 software modules was sampled out of four benchmark repositories: CM1, PC1, JM1, and KC1. Stratified sampling, Min-Max normalization, Synthetic minority over sampling technique (SMOTE) based class balancing, feature selection via Recursive Feature Elimination (RFE) and mutual information ranking were used as preprocessing steps. The training of the models used 10-fold cross-validation and hyperparameter optimization was done using Grid Search. The findings showed that the stacked ensemble performed better than any single classifier on all measures with the highest accuracy of 0.88 and statistically significant improvements in precision and recall and F1-score (p < 0.05). Data balancing and feature selection methods also increased the model stability and interpretability. In summary, the suggested framework will provide a powerful, scalable, and resource-optimal system to predict software defects. This method can be replicated in future studies on larger data sets and the use of deep learning-based meta-models to be more adaptable.

Downloads

Download data is not yet available.

Downloads

Published

2025-10-09

Issue

Section

Articles

How to Cite

Stacked Ensemble Learning For Software Defect Prediction: Model Integration And Cross-Dataset Validation. (2025). International Journal of Environmental Sciences, 2171-2181. https://doi.org/10.64252/9hcqz582