IndicBART for Translating Code-Mixed Kannada-English Sentences into Kannada: An Encoder-Decoder Transformer Approach

Translating Kannada-English code-mixed text continues to pose a major challenge in NLP owing to limited resource availability for Kannada, a lowresource Dravidian language, and the lack of parallel datasets. Existing models struggle with the structural complexity of code-mixed data, leading to subop...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2025 5th International Conference on Intelligent Technologies (CONIT) S. 1 - 6
Hauptverfasser: N, Shruthi, Sooda, Kavitha
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 20.06.2025
Schlagworte:
ISBN:9798331522322
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Translating Kannada-English code-mixed text continues to pose a major challenge in NLP owing to limited resource availability for Kannada, a lowresource Dravidian language, and the lack of parallel datasets. Existing models struggle with the structural complexity of code-mixed data, leading to suboptimal performance. To address this, we experimented with a transformer-based encoder-decoder model, leveraging two variants of IndicBART, a pre-trained multilingual model. We explored IndicBART's potential for transfer and few-shot learning by fine-tuning it on two Kannada-English code-mixed datasets: one in Roman script and the other in Kannada script, both paired with Kannada translations. Through selfattention and cross-attention mechanisms, IndicBART effectively captured the semantic essence of code-mixed sentences. Our experiments showed that both variants achieved significant BLEU scores of approximately 0.807, with each outperforming the other under different scenarios. This demonstrates their potential for code-mixed translation with minimal data. These findings highlight the effectiveness of our methodologies in tackling code-mixed translation challenges, establishing a basis for continued research in low-resource language settings.
ISBN:9798331522322
DOI:10.1109/CONIT65521.2025.11167161