NEO-NDA: Neo Natural Language Data Augmentation

Data augmentation generates synthetic data by making changes in data already obtained. It is applied to distinct data types like images, audio, and text. For textual data augmentation, many works propose restrictive transformations, for instance, they only work with one language (monolingual) or cre...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	2022 IEEE 16th International Conference on Semantic Computing (ICSC) S. 99 - 102
Hauptverfasser:	Ladeira, Lucas Z., Santos, Frances, Cleopas, Lucas, Buteneers, Pieter, Villas, Leandro
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	IEEE 01.01.2022
Schlagworte:	Conferences data augmentation Data models hybrid-approach multilingual Natural languages Semantics Switches text classification
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Data augmentation generates synthetic data by making changes in data already obtained. It is applied to distinct data types like images, audio, and text. For textual data augmentation, many works propose restrictive transformations, for instance, they only work with one language (monolingual) or create samples with a fixed length [1]-[4]. In this work, we propose NEO Natural language Data Augmentation (NEO-NDA), a more comprehensive tool able to address data generation and rebalancing datasets. It supports data augmentation of minority classes. NEO-NDA is able to work with multiple languages, besides implementing distinct transformations to create new samples. Our results show that NEO-NDA was able to boost the performance of ML models with all datasets evaluated and, in some cases, doubling the performance in comparison with original datasets when no data augmentation method is used.
DOI:	10.1109/ICSC52841.2022.00021