Fake Job Detection: A Comparative Study of Machine Learning and Hybrid Network-Based Approaches
DOI:
https://doi.org/10.64252/j7g8v063Keywords:
Fake Job Detection • Machine Learning • Network Analysis • Hybrid ModelsAbstract
Fake job postings on online platforms pose a significant threat, often enabling financial scams and identity theft. Existing detection systems, relying primarily on machine learning with textual features, struggle to capture relational patterns among jobs, companies, and locations. To overcome this limitation, we propose a hybrid framework that integrates structural insights from network analysis with traditional text-based machine learning. Specifically, we construct a heterogeneous job–company–location graph from the EMSCAD dataset and extract relational features such as degree centrality, betweenness centrality, clustering coefficient, and community structure. These network-derived signals complement TF-IDF textual vectors by uncovering hidden associations—such as coordinated fraudulent postings or abnormal company–location linkages—that text-only models fail to detect. When combined with classifiers including Naive Bayes, Logistic Regression, and Random Forest, the hybrid approach consistently improves robustness and accuracy. The Random Forest hybrid model achieves an F1-score of 0.4434 and ROC-AUC of 0.9944, surpassing ML-only baselines (F1-score: 0.3930, ROC-AUC: 0.9411). This work is novel in explicitly integrating structural network features with textual analysis for fake job detection, offering a scalable and resilient framework for combating recruitment fraud.