Few-shot Fine-tuning of BERT Multilingual for Hindi Word Sense Disambiguation
- 1 Department of Computer Science, Assam University, Silchar, India
- 2 Yardi School of Artificial Intelligence, Indian Institute of Technology, Delhi, India
Abstract
Word Sense Disambiguation (WSD) is a fundamental task in Natural Language Processing (NLP), addressing the challenge of identifying correct word meanings in context. This task is particularly complex for morphologically rich and resource-limited languages like Hindi, which exhibit significant lexical ambiguity compounded by limited availability of annotated corpora. To address these challenges, we propose a supervised approach combining the multilingual BERT model (mBERT) with Hindi WordNet as a structured lexical resource. Using few-shot learning, we fine-tune mBERT on a dataset constructed from Hindi WordNet to disambiguate contextually ambiguous words across four parts of speech (POS): nouns, verbs, adjectives, and adverbs. Experiments on standard Hindi WSD benchmarks demonstrate that our method significantly outperforms traditional rule-based and embedding-based approaches, achieving 96.48% accuracy—an approximate 3% improvement over the strongest baseline. These results validate the effectiveness of integrating contextualized embeddings from pre-trained language models with structured lexical databases, highlighting the promise of hybrid techniques for advancing WSD in low-resource languages and providing a framework applicable to other morphologically complex languages with similar resource constraints.
DOI: https://doi.org/10.3844/jcssp.2025.2631.2646
Copyright: © 2025 Shailendra Kumar Patel, Rakesh Kumar and Anuj Kumar Sirohi. This is an open access article distributed under the terms of the
Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 61 Views
- 6 Downloads
- 0 Citations
Download
Keywords
- Word Sense Disambiguation
- Hindi Natural Language Processing
- Multilingual BERT
- Hindi WordNet
- Low-Resource Languages
- Few-Shot Learning
- Contextualized Embeddings