Human–machine interaction in building an English reference dataset for natural language processing tasks
Rich in information and annotated instances, a reference annotated dataset is essential for the training and evaluation of Natural Language Processing (NLP) tools. However, the creation of such linguistic resources is a tedious and time-consuming task involving lexical, syntactic, and semantic annot...
Uložené v:
| Vydané v: | Language resources and evaluation Ročník 59; číslo 3; s. 2781 - 2809 |
|---|---|
| Hlavní autori: | , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Dordrect
Springer Nature B.V
01.09.2025
|
| Predmet: | |
| ISSN: | 1574-020X, 1574-0218 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | Rich in information and annotated instances, a reference annotated dataset is essential for the training and evaluation of Natural Language Processing (NLP) tools. However, the creation of such linguistic resources is a tedious and time-consuming task involving lexical, syntactic, and semantic annotations, typically at the sentence level. Assuming we could speed up the human annotation process, we employed pre-trained models (spaCy, AllenNLP, EWISER) to automatically annotate a dataset of 664 sentences (6853 tokens, including 1598 predicates) taken from grammar books. A multi-layered annotation task encompassed Lemmatization (LEM), Part-of-Speech Tagging (UPOS, XPOS), Named Entity Recognition (NER), Dependency Parsing (DEP, HEAD), Coreference Resolution (COREF), Semantic Role Labelling (SRL), Predicate Sense Disambiguation (PSD) and Word Sense Disambiguation (WSD). Three annotators post-edited the noisy automatic annotations, and their average Inter-Annotator Agreement (IAA) for all annotation tasks at the token level was 0.91 and at the sentence level 0.74. Evaluation metrics including Accuracy, Precision, Recall, and F1 revealed disparities between machine and human annotations, along with correlations between machine annotations at both token and sentence levels. Manual error analysis identified instances where NLP tools failed to generate accurate annotations. A comparison of time spent per layer revealed that refining a pre-annotated subset of sentences required significantly less time than annotating them manually from scratch. This process resulted in an English reference dataset, tailored for the development of a hypergraph-based knowledge extraction model, known as the Natural Language 2 Semantic Hyper-graph Dataset (NL2SH) 1.0), which is accessible through CLARIN. |
|---|---|
| Bibliografia: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1574-020X 1574-0218 |
| DOI: | 10.1007/s10579-025-09835-2 |