Self-Supervised Learning For Global Health Insights Using Multilingual Large Language Models

Authors

  • Amarnath Reddy Kallam Author

DOI:

https://doi.org/10.64252/bt5y4v73

Keywords:

Self-Supervised Learning, Multilingual Models, Global Health Data, Clinical NLP, Unsupervised Clustering

Abstract

The global distribution of healthcare records in diverse languages presents a substantial barrier to effective real-time public health surveillance. In this paper a methodology is presented, which uses self-supervised learning with multimodal health data to train multilingual large language models on a global scale. This paper outlines a predicted method of unsupervised clustering that will process multilingual symptom narratives of a clinical data set. They show that by processing patient symptoms and diagnoses with text cleaning, TF-IDF vectorization, and KMeans clustering they can identify latent structures present in the data even in the absence of manual annotations. This will form the basis of real-time and multilingual extraction of insight in global health records that would possibly be in transformational in the management, monitoring, and response of health crises by the institutions of public health in general.

Downloads

Download data is not yet available.

Downloads

Published

2025-08-04

Issue

Section

Articles

How to Cite

Self-Supervised Learning For Global Health Insights Using Multilingual Large Language Models. (2025). International Journal of Environmental Sciences, 2870-2876. https://doi.org/10.64252/bt5y4v73