Human–machine interaction in building an English reference dataset for natural language processing tasks

Rich in information and annotated instances, a reference annotated dataset is essential for the training and evaluation of Natural Language Processing (NLP) tools. However, the creation of such linguistic resources is a tedious and time-consuming task involving lexical, syntactic, and semantic annot...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Language resources and evaluation Ročník 59; číslo 3; s. 2781 - 2809
Hlavní autori: Žitko, Branko, Gašpar, Angelina, Bročić, Lucija, Vasić, Daniel, Grubišić, Ani
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Dordrect Springer Nature B.V 01.09.2025
Predmet:
ISSN:1574-020X, 1574-0218
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Rich in information and annotated instances, a reference annotated dataset is essential for the training and evaluation of Natural Language Processing (NLP) tools. However, the creation of such linguistic resources is a tedious and time-consuming task involving lexical, syntactic, and semantic annotations, typically at the sentence level. Assuming we could speed up the human annotation process, we employed pre-trained models (spaCy, AllenNLP, EWISER) to automatically annotate a dataset of 664 sentences (6853 tokens, including 1598 predicates) taken from grammar books. A multi-layered annotation task encompassed Lemmatization (LEM), Part-of-Speech Tagging (UPOS, XPOS), Named Entity Recognition (NER), Dependency Parsing (DEP, HEAD), Coreference Resolution (COREF), Semantic Role Labelling (SRL), Predicate Sense Disambiguation (PSD) and Word Sense Disambiguation (WSD). Three annotators post-edited the noisy automatic annotations, and their average Inter-Annotator Agreement (IAA) for all annotation tasks at the token level was 0.91 and at the sentence level 0.74. Evaluation metrics including Accuracy, Precision, Recall, and F1 revealed disparities between machine and human annotations, along with correlations between machine annotations at both token and sentence levels. Manual error analysis identified instances where NLP tools failed to generate accurate annotations. A comparison of time spent per layer revealed that refining a pre-annotated subset of sentences required significantly less time than annotating them manually from scratch. This process resulted in an English reference dataset, tailored for the development of a hypergraph-based knowledge extraction model, known as the Natural Language 2 Semantic Hyper-graph Dataset (NL2SH) 1.0), which is accessible through CLARIN.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1574-020X
1574-0218
DOI:10.1007/s10579-025-09835-2