Assisted Text Annotation Using Active Learning to Achieve High Quality with Little Effort

Large amounts of annotated data have become more important than ever, especially since the rise of deep learning techniques. However, manual annotations are costly. We propose a tool that enables researchers to create large, high-quality, annotated datasets with only a few manual annotations, thus s...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL) s. 287 - 288
Hlavní autoři:	Weeber, Franziska, Hamborg, Felix, Donnay, Karsten, Gipp, Bela
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 01.09.2021
Témata:	active learning Analytical models Annotations Costs data annotation Data models Deep learning Longformer Manuals text classification Training
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Large amounts of annotated data have become more important than ever, especially since the rise of deep learning techniques. However, manual annotations are costly. We propose a tool that enables researchers to create large, high-quality, annotated datasets with only a few manual annotations, thus strongly reducing annotation cost and effort. For this purpose, we combine an active learning (AL) approach with a pre-trained language model to semi-automatically identify annotation categories in the given text documents. To highlight our research direction's potential, we evaluate the approach on the task of identifying frames in news articles. Our preliminary results show that employing AL strongly reduces the number of annotations for correct classification of even these complex and subtle frames. On the framing dataset, the AL approach needs only 16.3% of the annotations to reach the same performance as a model trained on the full dataset.
DOI:	10.1109/JCDL52503.2021.00038