Optimizing Contextual Embedding-Based Text Classification: A Comparative Analysis Of PSO And GSO Feature Selection Approach
DOI:
https://doi.org/10.64252/268n7n80Keywords:
Text Classification, Contextual Embeddings, Particle Swarm Optimization, Glowworm Swarm Optimization, Feature Selection, AUC-ROC, Execution Time AnalysisAbstract
Text classification occupies a central position in natural language processing, with feature engineering and optimization being critical levers for enhancing predictive power. Conventional pipelines frequently falter in reconciling the competing demands of reduced feature space, model transparency, and accuracy. This study presents a novel hybrid architecture that leverages contextual embeddings within a fusion-centered feature construction, fine-tuned through the complementary heuristics of Particle Swarm Optimization (PSO) and Glowworm Swarm Optimization (GSO).The proposed framework orchestrates a cohesive feature assemblage from term-frequency-inverse-document-frequency (TF-IDF), pre-trained word embeddings, and condensed representations generated by singular value decomposition (SVD) and principal component analysis (PCA). Subsequent dimensional refinement is governed by a bi-level optimization schema that alternates between PSO and GSO. Benchmarks on the curated train and test splits indicate pronounced superiorities relative to a baseline that pairs TF-IDF with logistic regression. The PSO-tuned ensemble attained an accuracy of 96.82%, a macro F1-score of 96.77%, and an area under the receiver operating characteristic curve (AUC-ROC) of 0.987, while the GSO-tuned variant recorded 96.95%, 96.91%, and 0.989, respectively. Temporal profiling under proxy settings in Google Colab, with GPU support, disclosed that the PSO variant required 39.0 minutes of processing, whereas the GSO variant consumed 41.5 minutes, thereby affirming their operational viability.The integration of confusion matrices, disaggregated performance metrics, AUC-ROC visualizations, log-loss trajectories, and hyperparameter convergence profiles convincingly established GSO as the leading classifier, despite its marginally elevated computational burden. This work highlights the potency of swarm-intelligence-driven optimization in refining contextual-embedding-based text categorization and indicates the method’s broader applicability to NLP domains that mandate elevated accuracy and resilient feature-selection capabilities.




