IndicBART for Translating Code-Mixed Kannada-English Sentences into Kannada: An Encoder-Decoder Transformer Approach

Translating Kannada-English code-mixed text continues to pose a major challenge in NLP owing to limited resource availability for Kannada, a lowresource Dravidian language, and the lack of parallel datasets. Existing models struggle with the structural complexity of code-mixed data, leading to subop...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	2025 5th International Conference on Intelligent Technologies (CONIT) s. 1 - 6
Hlavní autori:	N, Shruthi, Sooda, Kavitha
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	IEEE 20.06.2025
Predmet:	Code-mixed texts Complexity theory Data models Encoder-Decoder Transformer Model Few shot learning IndicBart Kannada-English Code-mixed Multilingual Neural machine translation NLP Semantics Transformers Translation
ISBN:	9798331522322
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Translating Kannada-English code-mixed text continues to pose a major challenge in NLP owing to limited resource availability for Kannada, a lowresource Dravidian language, and the lack of parallel datasets. Existing models struggle with the structural complexity of code-mixed data, leading to suboptimal performance. To address this, we experimented with a transformer-based encoder-decoder model, leveraging two variants of IndicBART, a pre-trained multilingual model. We explored IndicBART's potential for transfer and few-shot learning by fine-tuning it on two Kannada-English code-mixed datasets: one in Roman script and the other in Kannada script, both paired with Kannada translations. Through selfattention and cross-attention mechanisms, IndicBART effectively captured the semantic essence of code-mixed sentences. Our experiments showed that both variants achieved significant BLEU scores of approximately 0.807, with each outperforming the other under different scenarios. This demonstrates their potential for code-mixed translation with minimal data. These findings highlight the effectiveness of our methodologies in tackling code-mixed translation challenges, establishing a basis for continued research in low-resource language settings.
ISBN:	9798331522322
DOI:	10.1109/CONIT65521.2025.11167161