Machine Learning Integration of Digital Pathology and Genomic Data to Predict Cancer Progression Risk
DOI:
https://doi.org/10.64252/rtksww89Keywords:
cancer, machine, learningAbstract
Aim: To develop and evaluate machine learning models that integrate digital pathology images and genomic data for accurate prediction of cancer progression risk, thereby improving diagnostic precision and enabling personalized treatment strategies.
Materials and methods: Whole-slide images (WSIs) of hematoxylin and eosin (H&E)-stained tissue samples and genomic data (RNA-seq and mutation profiles) were obtained from The Cancer Genome Atlas (TCGA) for three cancer types: breast, lung, and colon. A total of 500 patients were included (approximately 165 from each cancer type). Each patient had both digital pathology images and matched genomic data. Clinical records included progression-free survival (PFS), which was used to classify patients into progression vs. non-progression groups.
Results: In our study, the multimodal machine learning model that integrated digital pathology and genomic data outperformed unimodal approaches in predicting cancer progression risk. On a test set of 75 patients, the image-only model achieved an accuracy of 71.5% (AUC: 0.75, F1-score: 0.68), while the genomics-only model showed slightly better performance with 73.5% accuracy (AUC: 0.78, F1-score: 0.71). The multimodal fusion model demonstrated the highest performance, achieving an accuracy of 82.0%, an AUC of 0.88, and an F1-score of 0.80. When analyzed by cancer type, the fusion model yielded strong AUC scores across all three types: 0.89 for breast cancer, 0.86 for lung cancer, and 0.87 for colon cancer, with corresponding 95% confidence intervals indicating consistent robustness.
Conclusion: Multimodal AI models hold transformative potential to enhance cancer diagnosis and prognostication, paving the way for more precise and personalized cancer care; however, more large-scale, well-validated studies are required to generate definitive clinical evidence and support widespread implementation.