NEO-NDA: Neo Natural Language Data Augmentation
Data augmentation generates synthetic data by making changes in data already obtained. It is applied to distinct data types like images, audio, and text. For textual data augmentation, many works propose restrictive transformations, for instance, they only work with one language (monolingual) or cre...
Uložené v:
| Vydané v: | 2022 IEEE 16th International Conference on Semantic Computing (ICSC) s. 99 - 102 |
|---|---|
| Hlavní autori: | , , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
01.01.2022
|
| Predmet: | |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Data augmentation generates synthetic data by making changes in data already obtained. It is applied to distinct data types like images, audio, and text. For textual data augmentation, many works propose restrictive transformations, for instance, they only work with one language (monolingual) or create samples with a fixed length [1]-[4]. In this work, we propose NEO Natural language Data Augmentation (NEO-NDA), a more comprehensive tool able to address data generation and rebalancing datasets. It supports data augmentation of minority classes. NEO-NDA is able to work with multiple languages, besides implementing distinct transformations to create new samples. Our results show that NEO-NDA was able to boost the performance of ML models with all datasets evaluated and, in some cases, doubling the performance in comparison with original datasets when no data augmentation method is used. |
|---|---|
| AbstractList | Data augmentation generates synthetic data by making changes in data already obtained. It is applied to distinct data types like images, audio, and text. For textual data augmentation, many works propose restrictive transformations, for instance, they only work with one language (monolingual) or create samples with a fixed length [1]-[4]. In this work, we propose NEO Natural language Data Augmentation (NEO-NDA), a more comprehensive tool able to address data generation and rebalancing datasets. It supports data augmentation of minority classes. NEO-NDA is able to work with multiple languages, besides implementing distinct transformations to create new samples. Our results show that NEO-NDA was able to boost the performance of ML models with all datasets evaluated and, in some cases, doubling the performance in comparison with original datasets when no data augmentation method is used. |
| Author | Cleopas, Lucas Santos, Frances Villas, Leandro Ladeira, Lucas Z. Buteneers, Pieter |
| Author_xml | – sequence: 1 givenname: Lucas Z. surname: Ladeira fullname: Ladeira, Lucas Z. email: lucas.ladeira@ic.unicamp.br organization: Institute of Computing University of Campinas (Unicamp),São Paulo,Brazil – sequence: 2 givenname: Frances surname: Santos fullname: Santos, Frances email: frances.santos@ic.unicamp.br organization: Institute of Computing University of Campinas (Unicamp),São Paulo,Brazil – sequence: 3 givenname: Lucas surname: Cleopas fullname: Cleopas, Lucas email: lucas.cleopas@sinch.com organization: ML & AI, Sinch,Stockholm County,Sweden – sequence: 4 givenname: Pieter surname: Buteneers fullname: Buteneers, Pieter email: pieter.buteneers@sinch.com organization: ML & AI, Sinch,Stockholm County,Sweden – sequence: 5 givenname: Leandro surname: Villas fullname: Villas, Leandro email: leandro.villas@ic.unicamp.br organization: Institute of Computing University of Campinas (Unicamp),São Paulo,Brazil |
| BookMark | eNotzM1Og0AQAOA1sQetfQI98ALQmZ1lf7wRWmsTQg_aczPAQEhaMAgH396Dnr7b96juh3EQpV4QEkQI22P-kafaG0w0aJ0AgMY7tQnOo7WpIYPeP6htuT_F5S57jUoZo5LnZeJrVPDQLdxJtOOZo2zpbjLMPPfj8KRWLV-_ZfPvWp3f9p_5e1ycDsc8K-IeiebYkdigW8NaqiCB2LvWuYqDpLUxdQOaGbF1TQXCLWGtKbBw5Wo2jTUprdXz39uLyOVr6m88_VyCI0sA9AtAGkBb |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ICSC52841.2022.00021 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library (IEL) (UW System Shared) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9781665434188 166543418X |
| EndPage | 102 |
| ExternalDocumentID | 9736300 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i133t-73e692f4a2eb9e93a87f77ba9e5c44cd02aa11f7db0eaf31c239aeab7ca4d6453 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 1 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000835706300013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Thu Jun 29 18:36:55 EDT 2023 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i133t-73e692f4a2eb9e93a87f77ba9e5c44cd02aa11f7db0eaf31c239aeab7ca4d6453 |
| PageCount | 4 |
| ParticipantIDs | ieee_primary_9736300 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-Jan. |
| PublicationDateYYYYMMDD | 2022-01-01 |
| PublicationDate_xml | – month: 01 year: 2022 text: 2022-Jan. |
| PublicationDecade | 2020 |
| PublicationTitle | 2022 IEEE 16th International Conference on Semantic Computing (ICSC) |
| PublicationTitleAbbrev | ICSC |
| PublicationYear | 2022 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 1.7882555 |
| Snippet | Data augmentation generates synthetic data by making changes in data already obtained. It is applied to distinct data types like images, audio, and text. For... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 99 |
| SubjectTerms | Conferences data augmentation Data models hybrid-approach multilingual Natural languages Semantics Switches text classification |
| Title | NEO-NDA: Neo Natural Language Data Augmentation |
| URI | https://ieeexplore.ieee.org/document/9736300 |
| WOSCitedRecordID | wos000835706300013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA1t8eBJpRW_2YNH12422Z3EW-kHCrIWVOitTLIT6cGt1K2_3yZdqgcv3kIuYSaQYd7kvcfYtc1IgbHOiw-qWLpSxybzLJCEKyVLwESaYDYBRaFmMz1tsZsdF4aIwuczuvXLMMsvl3btobK-BuEVotqsDZBvuVoNG44nuv8wfB5mm9fWd31pkOH0AqC_PFNCyZgc_O-wQ9b74d5F011VOWItqrqsX4yf4mI0uIsKWkYFBrGM6LHBGqMR1hgN1m_vDZGo6rHXyfhleB83VgfxYtMk1jEIynXqJKZkNGmBChyAQU2ZldKWSYrIuYPSJIROcJsKjYQGLMoyl5k4Zp1qWdEJi7gVUiLkGp2Sm0xgnrhcgHSEnEjLU9b1wc4_tmoW8ybOs7-3z9m-z-YWdLhgnXq1pku2Z7_qxefqKlzBN9Ckh-k |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED2VggQToBbxTQZGQpPYiW22qh9qRQmVKFK36uycUQdSVFJ-P3UaFQYWNsuLdWfJp3vn9x7ArYlJCm2sEx-UPreZ8nXsWCBBKCXPBAZcl2YTIk3ldKrGNbjbcmGIqPx8RvduWc7ys4VZOaispQRzClE7sBtzHgUbtlbFhwsD1Rp2Xjrx-r11fV9UCnE6CdBfrill0egf_u-4I2j-sO-88bauHEON8ga00t6zn3bbD15KCy_FUi7DG1Voo9fFAr326u29ohLlTXjt9yadgV-ZHfjzdZtY-IJRoiLLMSKtSDGUwgqhUVFsODdZECGGoRWZDggtC03EFBJqYZBnCY_ZCdTzRU6n4IWGcY4iUWjlOk0RJoFNmOCWMCRS_AwaLtjZx0bPYlbFef739g3sDyZPo9lomD5ewIHL7AaCuIR6sVzRFeyZr2L-ubwur-MbFsWLMA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+IEEE+16th+International+Conference+on+Semantic+Computing+%28ICSC%29&rft.atitle=NEO-NDA%3A+Neo+Natural+Language+Data+Augmentation&rft.au=Ladeira%2C+Lucas+Z.&rft.au=Santos%2C+Frances&rft.au=Cleopas%2C+Lucas&rft.au=Buteneers%2C+Pieter&rft.date=2022-01-01&rft.pub=IEEE&rft.spage=99&rft.epage=102&rft_id=info:doi/10.1109%2FICSC52841.2022.00021&rft.externalDocID=9736300 |