Processing Low-Resource Languages: A Review Of Challenges And Strategies For Inclusive NLP And Sustainable Environment
DOI:
https://doi.org/10.64252/w55rwj24Abstract
The recent advances of Natural Language Processing (NLP) have significantly benefited people across the globe who speak and write some specific languages, commonly termed high-resource languages (HRLs). These are languages with abundant digital resources. One of the most significant areas where NLP has recently made its mark is environment preservation. However, a significant digital divide is observed for low-resource languages (LRLs), which are not rich in terms of digital resources. Even though many LRLs, like Assamese, are very important for the cultural and linguistic identity of many indigenous communities, they remain digitally underrepresented due to a lack of annotated corpora and computational tools. This gap is even more crucial in the environmental domain, where knowledge is very limited even for high-resource languages. Multilingual information is very important for NLP applications related to climate change, disaster management and other issues related to environmental science. This paper reports a study of the challenges and strategies to address the gap in NLP, both in the context of the global and Indian landscape. The paper also highlights the key problems while addressing issues related to environmental science due to the lack of digital resources. Some of the significant factors include the absence of large corpora, labelled datasets and socio-economic factors. The paper proposes to emphasize data collection and digitization, among many other measures, to address this gap. The integration of multilingual pretrained models and transfer learning approaches also provides a pathway for enhancing performance with limited resources. The paper also presents a detailed analysis of resources available for global and Indian languages in addition to proposing a set of strategic actions, including government policies, and others, for inclusive development in NLP and sustainable environment. Bridging the divide and enabling linguistic equity will ensure participation of all sections of the society in the advancement of inclusive technological growth and solutions to the emerging problems.