Feature Engineering Impact On Fake News Detection

Authors

  • Kaixiang Yang Author
  • Manual Selvaraj Bexci Author

DOI:

https://doi.org/10.64252/vyp6ty66

Keywords:

fake news detection, machine learning, deep learning, feature engineering, random forest, cross-validation

Abstract

The rapid spread of misinformation demands efficient detection tools, yet current systems overly rely on computationally intensive deep learning models like BERT, which lack transparency. This work introduces an interpretable alternative: a Random Forest (RF) classifier leveraging twelve linguistic features—such as title-text coherence, punctuation patterns, and lexical diversity—to identify deceptive content. Evaluated on 72,134 articles from the WELFake dataset with stratified cross-validation, the RF model achieves 86.03% accuracy and a 0.933 ROC-AUC, surpassing BERT by 5.5% and 0.057, respectively. Key insights reveal fake news exhibits 22% lower title-text alignment and 3.1× more exclamations than credible sources. The study critiques conventional evaluation practices, showing non-stratified splits inflate BERT’s perceived stability. By combining interpretable stylistic cues with CPU-efficient execution, this approach enables scalable deployment in resource-constrained environments, addressing critical gaps in both performance and operational practicality for real-world moderation systems.

Downloads

Download data is not yet available.

Downloads

Published

2025-07-02

Issue

Section

Articles

How to Cite

Feature Engineering Impact On Fake News Detection. (2025). International Journal of Environmental Sciences, 891-900. https://doi.org/10.64252/vyp6ty66