A Library Perspective on Nearly-Unsupervised Information Extraction Workflows in Digital Libraries

Information extraction can support novel and effective access paths for digital libraries. Nevertheless, designing reliable extraction workflows can be cost-intensive in practice. On the one hand, suitable extraction methods rely on domain-specific training data. On the other hand, unsupervised and...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries s. 1 - 11
Hlavní autoři:	Kroll, Hermann, Pirklbauer, Jan, Plotzky, Florian, Balke, Wolf-Tilo
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	ACM 20.06.2022
Témata:	Cleaning Digital Libraries Encyclopedias Libraries Open Information Extraction Reliability engineering Scalability Semantics Training data Workflows
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Information extraction can support novel and effective access paths for digital libraries. Nevertheless, designing reliable extraction workflows can be cost-intensive in practice. On the one hand, suitable extraction methods rely on domain-specific training data. On the other hand, unsupervised and open extraction methods usually produce not-canonicalized extraction results. This paper tackles the question how digital libraries can handle such extractions and if their quality is sufficient in practice. We focus on unsupervised extraction workflows by analyzing them in case studies in the domains of encyclopedias (Wikipedia), pharmacy and political sciences. We report on opportunities and limitations. Finally we discuss best practices for unsupervised extraction workflows.CCS CONCEPTS* Information systems → Information extraction; Data extraction and integration; Document representation.
DOI:	10.1145/3529372.3530924