IndicBART for Translating Code-Mixed Kannada-English Sentences into Kannada: An Encoder-Decoder Transformer Approach

Translating Kannada-English code-mixed text continues to pose a major challenge in NLP owing to limited resource availability for Kannada, a lowresource Dravidian language, and the lack of parallel datasets. Existing models struggle with the structural complexity of code-mixed data, leading to subop...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2025 5th International Conference on Intelligent Technologies (CONIT) s. 1 - 6
Hlavní autori: N, Shruthi, Sooda, Kavitha
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 20.06.2025
Predmet:
ISBN:9798331522322
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Translating Kannada-English code-mixed text continues to pose a major challenge in NLP owing to limited resource availability for Kannada, a lowresource Dravidian language, and the lack of parallel datasets. Existing models struggle with the structural complexity of code-mixed data, leading to suboptimal performance. To address this, we experimented with a transformer-based encoder-decoder model, leveraging two variants of IndicBART, a pre-trained multilingual model. We explored IndicBART's potential for transfer and few-shot learning by fine-tuning it on two Kannada-English code-mixed datasets: one in Roman script and the other in Kannada script, both paired with Kannada translations. Through selfattention and cross-attention mechanisms, IndicBART effectively captured the semantic essence of code-mixed sentences. Our experiments showed that both variants achieved significant BLEU scores of approximately 0.807, with each outperforming the other under different scenarios. This demonstrates their potential for code-mixed translation with minimal data. These findings highlight the effectiveness of our methodologies in tackling code-mixed translation challenges, establishing a basis for continued research in low-resource language settings.
AbstractList Translating Kannada-English code-mixed text continues to pose a major challenge in NLP owing to limited resource availability for Kannada, a lowresource Dravidian language, and the lack of parallel datasets. Existing models struggle with the structural complexity of code-mixed data, leading to suboptimal performance. To address this, we experimented with a transformer-based encoder-decoder model, leveraging two variants of IndicBART, a pre-trained multilingual model. We explored IndicBART's potential for transfer and few-shot learning by fine-tuning it on two Kannada-English code-mixed datasets: one in Roman script and the other in Kannada script, both paired with Kannada translations. Through selfattention and cross-attention mechanisms, IndicBART effectively captured the semantic essence of code-mixed sentences. Our experiments showed that both variants achieved significant BLEU scores of approximately 0.807, with each outperforming the other under different scenarios. This demonstrates their potential for code-mixed translation with minimal data. These findings highlight the effectiveness of our methodologies in tackling code-mixed translation challenges, establishing a basis for continued research in low-resource language settings.
Author N, Shruthi
Sooda, Kavitha
Author_xml – sequence: 1
  givenname: Shruthi
  surname: N
  fullname: N, Shruthi
  email: imshruthin29@gmail.com
  organization: College of Engineering,Dept of CSE B.M.S.,Bangalore,India
– sequence: 2
  givenname: Kavitha
  surname: Sooda
  fullname: Sooda, Kavitha
  email: kavithas.cse@bmsce.ac.in
  organization: College of Engineering,Dept of CSE B.M.S.,Bangalore,India
BookMark eNpVkMFOwzAMhoOAA4y9AYe8QEftLO3CrZQBE4NJ0PuUJu4WqXOrtgd4eyrGDpw--__l7-BrccENkxAS4hlAbO7yzfuqSLRGmGGMegwhSSGBMzE1qVkoBRpRxYvzfzvilRhW7IN7yD4KWTWdLDrLfW2HwDuZN56it_BFXr5aZutttORdHfq9_CQeiB31MvDQnOp7mbFcshvvuuiRfnk0jurDOGdt2zXW7W_EZWXrnqZ_nIjiaVnkL9F687zKs3UUjBoi1Kr0WJLSlam8rbR34JwhkypjEyxh7ss0tYnyOJ8DIJoy1QZcrK0vjarURNwetYGItm0XDrb73p5-o34AbQldOA
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CONIT65521.2025.11167161
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798331522308
9798331522339
EndPage 6
ExternalDocumentID 11167161
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i93t-253bd2be35f9fdaf5dc1cc9e9739a62b14db77a63d24411229b7591c05adb93f3
IEDL.DBID RIE
ISBN 9798331522322
IngestDate Wed Oct 01 07:05:12 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i93t-253bd2be35f9fdaf5dc1cc9e9739a62b14db77a63d24411229b7591c05adb93f3
PageCount 6
ParticipantIDs ieee_primary_11167161
PublicationCentury 2000
PublicationDate 2025-June-20
PublicationDateYYYYMMDD 2025-06-20
PublicationDate_xml – month: 06
  year: 2025
  text: 2025-June-20
  day: 20
PublicationDecade 2020
PublicationTitle 2025 5th International Conference on Intelligent Technologies (CONIT)
PublicationTitleAbbrev CONIT
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.9123981
Snippet Translating Kannada-English code-mixed text continues to pose a major challenge in NLP owing to limited resource availability for Kannada, a lowresource...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Code-mixed texts
Complexity theory
Data models
Encoder-Decoder Transformer Model
Few shot learning
IndicBart
Kannada-English Code-mixed
Multilingual
Neural machine translation
NLP
Semantics
Transformers
Translation
Title IndicBART for Translating Code-Mixed Kannada-English Sentences into Kannada: An Encoder-Decoder Transformer Approach
URI https://ieeexplore.ieee.org/document/11167161
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwGA06PAiCihN_k4PXbG2yNI23OTcc4hy4w24jTb5IL5nMTvzzTbJ24sGDp4aGlpBA3vuS974PoVsf4PA8yYEoZRnpCfD7oGchpGdpoVRqIOU2FpsQk0k-n8tpbVaPXhgAiOIz6IRmvMs3S70OR2XdNFwapCHY2RUia8xaB1LInDEPRJ4dbNU6iewOXibjWcY9QPk4kPJO8_mvQioRR0aH_xzBEWr_OPLwdIs1x2gH3Amqxs6U-t5TUuypJ46wE6Rt7g0PlgbIc_kFBj8p55RRpDbs4teQhTOIp3HpqmXTfYf7Dg9dcLivyAPE5-aPgdX6dr9OPt5Gs9FwNngkdRUFUkpWEcpZYWgBjFtpjbLc6FRrCVIwqTJapD1TCKEyZjzQe_JFZSG4THXClSkks-wUtdzSwRnC2jAW-J1JfNDGTaJ4AjlQZjOhPHPLzlE7zNjifZMnY9FM1sUf7y_RfliXILyiyRVqVas1XKM9_VmVH6ubuLrfye6lJQ
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwGA2igoKg4sTf5uA1W9ssTeNtzo2NbXXgDruNNPkqvaQyO_HPN8m6iQcPnhJSCCGBvvcl730fQg82wGFJkACRMqekzcH-By0LIe08yqQMNYQs98UmeJom87mY1mZ174UBAC8-g6br-rd8XaqVuyprhe7RIHTBzp4rnbW1ax0JLhJKLRRZfrDV6wSi1X1Jh7OYWYiykWDEmpsJfpVS8UjSP_7nGk5Q48eTh6dbtDlFO2DOUDU0ulBPlpRiSz6xBx4nbjNvuFtqIJPiCzQeSWOklqS27OJXl4fTyadxYapy8_kRdwzuGedxX5Jn8O16Rsdrbb9Tpx9voFm_N-sOSF1HgRSCViRiNNNRBpTlItcyZ1qFSgkQnAoZR1nY1hnnMqbaQr2lX5HIOBOhCpjUmaA5PUe7pjRwgbDSlDqGpwMbtjEdSBZAAhHNYy4td4svUcPt2OJ9nSljsdmsqz_G79HBYDYZL8bDdHSNDt0ZORlWFNyg3Wq5glu0rz6r4mN550_6G5QYqG4
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+5th+International+Conference+on+Intelligent+Technologies+%28CONIT%29&rft.atitle=IndicBART+for+Translating+Code-Mixed+Kannada-English+Sentences+into+Kannada%3A+An+Encoder-Decoder+Transformer+Approach&rft.au=N%2C+Shruthi&rft.au=Sooda%2C+Kavitha&rft.date=2025-06-20&rft.pub=IEEE&rft.isbn=9798331522322&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FCONIT65521.2025.11167161&rft.externalDocID=11167161
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9798331522322/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9798331522322/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9798331522322/sc.gif&client=summon&freeimage=true