Feature Engineering Impact On Fake News Detection

Kaixiang Yang; Manual Selvaraj Bexci

doi:10.64252/vyp6ty66

Authors

Kaixiang Yang Author
Manual Selvaraj Bexci Author

DOI:

https://doi.org/10.64252/vyp6ty66

Keywords:

fake news detection, machine learning, deep learning, feature engineering, random forest, cross-validation

Abstract

The rapid spread of misinformation demands efficient detection tools, yet current systems overly rely on computationally intensive deep learning models like BERT, which lack transparency. This work introduces an interpretable alternative: a Random Forest (RF) classifier leveraging twelve linguistic features—such as title-text coherence, punctuation patterns, and lexical diversity—to identify deceptive content. Evaluated on 72,134 articles from the WELFake dataset with stratified cross-validation, the RF model achieves 86.03% accuracy and a 0.933 ROC-AUC, surpassing BERT by 5.5% and 0.057, respectively. Key insights reveal fake news exhibits 22% lower title-text alignment and 3.1× more exclamations than credible sources. The study critiques conventional evaluation practices, showing non-stratified splits inflate BERT’s perceived stability. By combining interpretable stylistic cues with CPU-efficient execution, this approach enables scalable deployment in resource-constrained environments, addressing critical gaps in both performance and operational practicality for real-world moderation systems.

Downloads

Download data is not yet available.

Feature Engineering Impact On Fake News Detection

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission

Indexing

Language