NEO-NDA: Neo Natural Language Data Augmentation

Data augmentation generates synthetic data by making changes in data already obtained. It is applied to distinct data types like images, audio, and text. For textual data augmentation, many works propose restrictive transformations, for instance, they only work with one language (monolingual) or cre...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2022 IEEE 16th International Conference on Semantic Computing (ICSC) s. 99 - 102
Hlavní autoři:	Ladeira, Lucas Z., Santos, Frances, Cleopas, Lucas, Buteneers, Pieter, Villas, Leandro
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 01.01.2022
Témata:	Conferences data augmentation Data models hybrid-approach multilingual Natural languages Semantics Switches text classification
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	Data augmentation generates synthetic data by making changes in data already obtained. It is applied to distinct data types like images, audio, and text. For textual data augmentation, many works propose restrictive transformations, for instance, they only work with one language (monolingual) or create samples with a fixed length [1]-[4]. In this work, we propose NEO Natural language Data Augmentation (NEO-NDA), a more comprehensive tool able to address data generation and rebalancing datasets. It supports data augmentation of minority classes. NEO-NDA is able to work with multiple languages, besides implementing distinct transformations to create new samples. Our results show that NEO-NDA was able to boost the performance of ML models with all datasets evaluated and, in some cases, doubling the performance in comparison with original datasets when no data augmentation method is used.
AbstractList	Data augmentation generates synthetic data by making changes in data already obtained. It is applied to distinct data types like images, audio, and text. For textual data augmentation, many works propose restrictive transformations, for instance, they only work with one language (monolingual) or create samples with a fixed length [1]-[4]. In this work, we propose NEO Natural language Data Augmentation (NEO-NDA), a more comprehensive tool able to address data generation and rebalancing datasets. It supports data augmentation of minority classes. NEO-NDA is able to work with multiple languages, besides implementing distinct transformations to create new samples. Our results show that NEO-NDA was able to boost the performance of ML models with all datasets evaluated and, in some cases, doubling the performance in comparison with original datasets when no data augmentation method is used.
Author	Cleopas, Lucas Santos, Frances Villas, Leandro Ladeira, Lucas Z. Buteneers, Pieter
Author_xml	– sequence: 1 givenname: Lucas Z. surname: Ladeira fullname: Ladeira, Lucas Z. email: lucas.ladeira@ic.unicamp.br organization: Institute of Computing University of Campinas (Unicamp),São Paulo,Brazil – sequence: 2 givenname: Frances surname: Santos fullname: Santos, Frances email: frances.santos@ic.unicamp.br organization: Institute of Computing University of Campinas (Unicamp),São Paulo,Brazil – sequence: 3 givenname: Lucas surname: Cleopas fullname: Cleopas, Lucas email: lucas.cleopas@sinch.com organization: ML & AI, Sinch,Stockholm County,Sweden – sequence: 4 givenname: Pieter surname: Buteneers fullname: Buteneers, Pieter email: pieter.buteneers@sinch.com organization: ML & AI, Sinch,Stockholm County,Sweden – sequence: 5 givenname: Leandro surname: Villas fullname: Villas, Leandro email: leandro.villas@ic.unicamp.br organization: Institute of Computing University of Campinas (Unicamp),São Paulo,Brazil
BookMark	eNotzM1Og0AQAOA1sQetfQI98ALQmZ1lf7wRWmsTQg_aczPAQEhaMAgH396Dnr7b96juh3EQpV4QEkQI22P-kafaG0w0aJ0AgMY7tQnOo7WpIYPeP6htuT_F5S57jUoZo5LnZeJrVPDQLdxJtOOZo2zpbjLMPPfj8KRWLV-_ZfPvWp3f9p_5e1ycDsc8K-IeiebYkdigW8NaqiCB2LvWuYqDpLUxdQOaGbF1TQXCLWGtKbBw5Wo2jTUprdXz39uLyOVr6m88_VyCI0sA9AtAGkBb
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/ICSC52841.2022.00021
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) (UW System Shared) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9781665434188 166543418X
EndPage	102
ExternalDocumentID	9736300
Genre	orig-research
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-LOGICAL-i133t-73e692f4a2eb9e93a87f77ba9e5c44cd02aa11f7db0eaf31c239aeab7ca4d6453
IEDL.DBID	RIE
ISICitedReferencesCount	1
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000835706300013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Thu Jun 29 18:36:55 EDT 2023
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i133t-73e692f4a2eb9e93a87f77ba9e5c44cd02aa11f7db0eaf31c239aeab7ca4d6453
PageCount	4
ParticipantIDs	ieee_primary_9736300
PublicationCentury	2000
PublicationDate	2022-Jan.
PublicationDateYYYYMMDD	2022-01-01
PublicationDate_xml	– month: 01 year: 2022 text: 2022-Jan.
PublicationDecade	2020
PublicationTitle	2022 IEEE 16th International Conference on Semantic Computing (ICSC)
PublicationTitleAbbrev	ICSC
PublicationYear	2022
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	1.7882555
Snippet	Data augmentation generates synthetic data by making changes in data already obtained. It is applied to distinct data types like images, audio, and text. For...
SourceID	ieee
SourceType	Publisher
StartPage	99
SubjectTerms	Conferences data augmentation Data models hybrid-approach multilingual Natural languages Semantics Switches text classification
Title	NEO-NDA: Neo Natural Language Data Augmentation
URI	https://ieeexplore.ieee.org/document/9736300
WOSCitedRecordID	wos000835706300013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED21FQMToBbxLQ-MmCaxG8dsVT8EEgqVAKlbdXbOqAMpKgm_nziJCgML28mLdbbspzv7vQdwTca5pMJxHtrEchk7w7XVVWRFnITSJYSN2YRK02S51IsO3Oy4MERUfz6jWx_Wb_nZxpa-VTbUSniFqC50lYobrlbLhgsDPXyYPE9G1W3rq76oluH0AqC_PFNqyJgf_G-yQxj8cO_YYocqR9ChvA_DdPbE0-n4jqW0YSnWYhnsse01sikWyMbl23tLJMoH8DqfvUzueWt1wNdVkVhwJSjWkZMYkdGkBSbKKWVQ08hKabMgQgxDpzITEDoR2khoJDTKosxiORLH0Ms3OZ0AM2icVNar0gQSY2uwOnaGMkVWWtLBKfR9squPRs1i1eZ59vfwOez71WyaDhfQK7YlXcKe_SrWn9uregu-ATU-iVo
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED2VggQToBbxTQZGTPPhxDFb1Q-1ooRKFKlbdXbOqAMpKim_nziJCgML28mLdbbspzv7vQdwS8qYuMBx5ulYMx4ZxaSWRaSDKPa4iQkrswmRJPF8LqcNuNtyYYio_HxG9zYs3_LTld7YVllHisAqRO3Absi571ZsrZoP57myM-699MLivrV1n18KcVoJ0F-uKSVoDA__N90RtH_Yd850iyvH0KCsBZ1k8MySfvfBSWjlJFjKZTiTutvo9DFHp7t5e6-pRFkbXoeDWW_EarMDtizKxJyJgCLpG44-KUkywFgYIRRKCjXnOnV9RM8zIlUuoQk87QcSCZXQyNOIh8EJNLNVRqfgKFSGC211aVyOkVZYHDxFqSDNNUn3DFo22cVHpWexqPM8_3v4BvZHs6fJYjJOHi_gwK5s1YK4hGa-3tAV7OmvfPm5vi634xuNvIyh
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+IEEE+16th+International+Conference+on+Semantic+Computing+%28ICSC%29&rft.atitle=NEO-NDA%3A+Neo+Natural+Language+Data+Augmentation&rft.au=Ladeira%2C+Lucas+Z.&rft.au=Santos%2C+Frances&rft.au=Cleopas%2C+Lucas&rft.au=Buteneers%2C+Pieter&rft.date=2022-01-01&rft.pub=IEEE&rft.spage=99&rft.epage=102&rft_id=info:doi/10.1109%2FICSC52841.2022.00021&rft.externalDocID=9736300