Implementation of Single Candidate Loss Optimization Algorithm for Loss Optimization of Bhojpuri-English Machine Translation Model
- 1 Department of Computer Science and IT, MJP Rohilkhand University, Bareilly, India
- 2 Department of Computer Science and Information Taibah University, Madinah, Saudi Arabia
Abstract
Machine translation of low-resource Indian languages is necessary as most of the regions still know and speak their specific dialects and are still not comfortable understanding the English language. Indian languages are morphologically rich, due to which there are two big challenges, Ambiguity and Domain adaption, which are faced by researchers during the translation. Lack of data also increases the challenge for the researchers. In this study, we proposed a novel machine translation model that uses a single candidate optimization algorithm for loss optimization and have proved through results that it is more optimal than traditional gradient-based algorithms. We have used byte pair encoding for tokenization and then BERT is used for contextualized word embedding. The novelty is induced in our model as the traditional transformer model is used with a variation of loss optimization using a single candidate optimization technique during training to refrain from overfitting rather than traditional gradient-based techniques. The results have been compared with other state-of-the-art models and described in tabular form.
DOI: https://doi.org/10.3844/jcssp.2025.1059.1070
Copyright: © 2025 Rituraj Dixit, Sarabjeet Singh Bedi, Ibrahim Aljubayri and Mohammad Zubair Khan. This is an open access article distributed under the terms of the
Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 241 Views
- 89 Downloads
- 0 Citations
Download
Keywords
- Low Resource Language
- Machine Translation
- Byte Pair Encoding
- Fine-Tuning
- Loss Optimization