Zobrazit v EDS

A Domain Meta-wrapper Using Seeds for Intelligent Author List Extraction in the Domain of Scholarly Articles

Uloženo v:

Podrobná bibliografie
Název:	A Domain Meta-wrapper Using Seeds for Intelligent Author List Extraction in the Domain of Scholarly Articles
Autoři:	Cauteruccio, F., IANNI, Giovambattista
Zdroj:	Lecture Notes in Computer Science ISBN: 9783642405006
Informace o vydavateli:	Springer Berlin Heidelberg, 2013.
Rok vydání:	2013
Popis:	In this paper we investigate about automated extraction of author lists in the domain of scientific digital libraries. It is given a list of known “seed” authors and we aim to extract complete lists of co-authors from Web pages in arbitrary format. We adopt a methodology embedding domain knowledge in a unique “meta-wrapper”, not requiring training, with negligible maintenance costs and based on the combination of several extraction techniques. Such methods are applied at the structural level, at the character level and at the annotation level. We describe the methodology, illustrate our tool, compare with known approaches and measure the accuracy of our techniques with proper experiments.
Druh dokumentu:	Part of book or chapter of book Article Conference object
DOI:	10.1007/978-3-642-40501-3_31
DOI:	10.1007/978-3-642-40501-3
Přístupová URL adresa:	https://rd.springer.com/chapter/10.1007/978-3-642-40501-3_31 https://link.springer.com/content/pdf/10.1007%2F978-3-642-40501-3_31.pdf https://link.springer.com/chapter/10.1007/978-3-642-40501-3_31 http://www.mat.unical.it/ianni/storage/HCalc-TR-2013-1-Long.pdf https://dblp.uni-trier.de/db/conf/ercimdl/tpdl2013.html#CauteruccioI13 http://link.springer.com/chapter/10.1007/978-3-642-40501-3_31 https://hdl.handle.net/11386/4852482 https://doi.org/10.1007/978-3-642-40501-3_31 http://link.springer.com/chapter/10.1007/978-3-642-40501-3_31 https://doi.org/10.1007/978-3-642-40501-3_31 https://hdl.handle.net/11386/4852477
Přístupové číslo:	edsair.doi.dedup.....a7f3ef0670a8dcf7c67901493f45fcdc
Databáze:	OpenAIRE

View record at OpenAIRE

Nájsť tento článok vo Web of Science

Popis
Abstrakt:	In this paper we investigate about automated extraction of author lists in the domain of scientific digital libraries. It is given a list of known “seed” authors and we aim to extract complete lists of co-authors from Web pages in arbitrary format. We adopt a methodology embedding domain knowledge in a unique “meta-wrapper”, not requiring training, with negligible maintenance costs and based on the combination of several extraction techniques. Such methods are applied at the structural level, at the character level and at the annotation level. We describe the methodology, illustrate our tool, compare with known approaches and measure the accuracy of our techniques with proper experiments.
DOI:	10.1007/978-3-642-40501-3_31