NEO-NDA: Neo Natural Language Data Augmentation

Data augmentation generates synthetic data by making changes in data already obtained. It is applied to distinct data types like images, audio, and text. For textual data augmentation, many works propose restrictive transformations, for instance, they only work with one language (monolingual) or cre...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2022 IEEE 16th International Conference on Semantic Computing (ICSC) s. 99 - 102
Hlavní autoři: Ladeira, Lucas Z., Santos, Frances, Cleopas, Lucas, Buteneers, Pieter, Villas, Leandro
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.01.2022
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Data augmentation generates synthetic data by making changes in data already obtained. It is applied to distinct data types like images, audio, and text. For textual data augmentation, many works propose restrictive transformations, for instance, they only work with one language (monolingual) or create samples with a fixed length [1]-[4]. In this work, we propose NEO Natural language Data Augmentation (NEO-NDA), a more comprehensive tool able to address data generation and rebalancing datasets. It supports data augmentation of minority classes. NEO-NDA is able to work with multiple languages, besides implementing distinct transformations to create new samples. Our results show that NEO-NDA was able to boost the performance of ML models with all datasets evaluated and, in some cases, doubling the performance in comparison with original datasets when no data augmentation method is used.
AbstractList Data augmentation generates synthetic data by making changes in data already obtained. It is applied to distinct data types like images, audio, and text. For textual data augmentation, many works propose restrictive transformations, for instance, they only work with one language (monolingual) or create samples with a fixed length [1]-[4]. In this work, we propose NEO Natural language Data Augmentation (NEO-NDA), a more comprehensive tool able to address data generation and rebalancing datasets. It supports data augmentation of minority classes. NEO-NDA is able to work with multiple languages, besides implementing distinct transformations to create new samples. Our results show that NEO-NDA was able to boost the performance of ML models with all datasets evaluated and, in some cases, doubling the performance in comparison with original datasets when no data augmentation method is used.
Author Cleopas, Lucas
Santos, Frances
Villas, Leandro
Ladeira, Lucas Z.
Buteneers, Pieter
Author_xml – sequence: 1
  givenname: Lucas Z.
  surname: Ladeira
  fullname: Ladeira, Lucas Z.
  email: lucas.ladeira@ic.unicamp.br
  organization: Institute of Computing University of Campinas (Unicamp),São Paulo,Brazil
– sequence: 2
  givenname: Frances
  surname: Santos
  fullname: Santos, Frances
  email: frances.santos@ic.unicamp.br
  organization: Institute of Computing University of Campinas (Unicamp),São Paulo,Brazil
– sequence: 3
  givenname: Lucas
  surname: Cleopas
  fullname: Cleopas, Lucas
  email: lucas.cleopas@sinch.com
  organization: ML & AI, Sinch,Stockholm County,Sweden
– sequence: 4
  givenname: Pieter
  surname: Buteneers
  fullname: Buteneers, Pieter
  email: pieter.buteneers@sinch.com
  organization: ML & AI, Sinch,Stockholm County,Sweden
– sequence: 5
  givenname: Leandro
  surname: Villas
  fullname: Villas, Leandro
  email: leandro.villas@ic.unicamp.br
  organization: Institute of Computing University of Campinas (Unicamp),São Paulo,Brazil
BookMark eNotzM1Og0AQAOA1sQetfQI98ALQmZ1lf7wRWmsTQg_aczPAQEhaMAgH396Dnr7b96juh3EQpV4QEkQI22P-kafaG0w0aJ0AgMY7tQnOo7WpIYPeP6htuT_F5S57jUoZo5LnZeJrVPDQLdxJtOOZo2zpbjLMPPfj8KRWLV-_ZfPvWp3f9p_5e1ycDsc8K-IeiebYkdigW8NaqiCB2LvWuYqDpLUxdQOaGbF1TQXCLWGtKbBw5Wo2jTUprdXz39uLyOVr6m88_VyCI0sA9AtAGkBb
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICSC52841.2022.00021
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL) (UW System Shared)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781665434188
166543418X
EndPage 102
ExternalDocumentID 9736300
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i133t-73e692f4a2eb9e93a87f77ba9e5c44cd02aa11f7db0eaf31c239aeab7ca4d6453
IEDL.DBID RIE
ISICitedReferencesCount 1
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000835706300013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Thu Jun 29 18:36:55 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i133t-73e692f4a2eb9e93a87f77ba9e5c44cd02aa11f7db0eaf31c239aeab7ca4d6453
PageCount 4
ParticipantIDs ieee_primary_9736300
PublicationCentury 2000
PublicationDate 2022-Jan.
PublicationDateYYYYMMDD 2022-01-01
PublicationDate_xml – month: 01
  year: 2022
  text: 2022-Jan.
PublicationDecade 2020
PublicationTitle 2022 IEEE 16th International Conference on Semantic Computing (ICSC)
PublicationTitleAbbrev ICSC
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.7882555
Snippet Data augmentation generates synthetic data by making changes in data already obtained. It is applied to distinct data types like images, audio, and text. For...
SourceID ieee
SourceType Publisher
StartPage 99
SubjectTerms Conferences
data augmentation
Data models
hybrid-approach
multilingual
Natural languages
Semantics
Switches
text classification
Title NEO-NDA: Neo Natural Language Data Augmentation
URI https://ieeexplore.ieee.org/document/9736300
WOSCitedRecordID wos000835706300013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED21FQMToBbxLQ-MmCaxG8dsVT8EEgqVAKlbdXbOqAMpKgm_nziJCgML28mLdbbspzv7vQdwTca5pMJxHtrEchk7w7XVVWRFnITSJYSN2YRK02S51IsO3Oy4MERUfz6jWx_Wb_nZxpa-VTbUSniFqC50lYobrlbLhgsDPXyYPE9G1W3rq76oluH0AqC_PFNqyJgf_G-yQxj8cO_YYocqR9ChvA_DdPbE0-n4jqW0YSnWYhnsse01sikWyMbl23tLJMoH8DqfvUzueWt1wNdVkVhwJSjWkZMYkdGkBSbKKWVQ08hKabMgQgxDpzITEDoR2khoJDTKosxiORLH0Ms3OZ0AM2icVNar0gQSY2uwOnaGMkVWWtLBKfR9squPRs1i1eZ59vfwOez71WyaDhfQK7YlXcKe_SrWn9uregu-ATU-iVo
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED2VggQToBbxTQZGTPPhxDFb1Q-1ooRKFKlbdXbOqAMpKim_nziJCgML28mLdbbspzv7vQdwS8qYuMBx5ulYMx4ZxaSWRaSDKPa4iQkrswmRJPF8LqcNuNtyYYio_HxG9zYs3_LTld7YVllHisAqRO3Absi571ZsrZoP57myM-699MLivrV1n18KcVoJ0F-uKSVoDA__N90RtH_Yd850iyvH0KCsBZ1k8MySfvfBSWjlJFjKZTiTutvo9DFHp7t5e6-pRFkbXoeDWW_EarMDtizKxJyJgCLpG44-KUkywFgYIRRKCjXnOnV9RM8zIlUuoQk87QcSCZXQyNOIh8EJNLNVRqfgKFSGC211aVyOkVZYHDxFqSDNNUn3DFo22cVHpWexqPM8_3v4BvZHs6fJYjJOHi_gwK5s1YK4hGa-3tAV7OmvfPm5vi634xuNvIyh
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+IEEE+16th+International+Conference+on+Semantic+Computing+%28ICSC%29&rft.atitle=NEO-NDA%3A+Neo+Natural+Language+Data+Augmentation&rft.au=Ladeira%2C+Lucas+Z.&rft.au=Santos%2C+Frances&rft.au=Cleopas%2C+Lucas&rft.au=Buteneers%2C+Pieter&rft.date=2022-01-01&rft.pub=IEEE&rft.spage=99&rft.epage=102&rft_id=info:doi/10.1109%2FICSC52841.2022.00021&rft.externalDocID=9736300