A Domain Meta-wrapper Using Seeds for Intelligent Author List Extraction in the Domain of Scholarly Articles
Uloženo v:
| Název: | A Domain Meta-wrapper Using Seeds for Intelligent Author List Extraction in the Domain of Scholarly Articles |
|---|---|
| Autoři: | Cauteruccio, F., IANNI, Giovambattista |
| Zdroj: | Lecture Notes in Computer Science ISBN: 9783642405006 |
| Informace o vydavateli: | Springer Berlin Heidelberg, 2013. |
| Rok vydání: | 2013 |
| Popis: | In this paper we investigate about automated extraction of author lists in the domain of scientific digital libraries. It is given a list of known “seed” authors and we aim to extract complete lists of co-authors from Web pages in arbitrary format. We adopt a methodology embedding domain knowledge in a unique “meta-wrapper”, not requiring training, with negligible maintenance costs and based on the combination of several extraction techniques. Such methods are applied at the structural level, at the character level and at the annotation level. We describe the methodology, illustrate our tool, compare with known approaches and measure the accuracy of our techniques with proper experiments. |
| Druh dokumentu: | Part of book or chapter of book Article Conference object |
| DOI: | 10.1007/978-3-642-40501-3_31 |
| DOI: | 10.1007/978-3-642-40501-3 |
| Přístupová URL adresa: | https://rd.springer.com/chapter/10.1007/978-3-642-40501-3_31 https://link.springer.com/content/pdf/10.1007%2F978-3-642-40501-3_31.pdf https://link.springer.com/chapter/10.1007/978-3-642-40501-3_31 http://www.mat.unical.it/ianni/storage/HCalc-TR-2013-1-Long.pdf https://dblp.uni-trier.de/db/conf/ercimdl/tpdl2013.html#CauteruccioI13 http://link.springer.com/chapter/10.1007/978-3-642-40501-3_31 https://hdl.handle.net/11386/4852482 https://doi.org/10.1007/978-3-642-40501-3_31 http://link.springer.com/chapter/10.1007/978-3-642-40501-3_31 https://doi.org/10.1007/978-3-642-40501-3_31 https://hdl.handle.net/11386/4852477 |
| Přístupové číslo: | edsair.doi.dedup.....a7f3ef0670a8dcf7c67901493f45fcdc |
| Databáze: | OpenAIRE |
| Abstrakt: | In this paper we investigate about automated extraction of author lists in the domain of scientific digital libraries. It is given a list of known “seed” authors and we aim to extract complete lists of co-authors from Web pages in arbitrary format. We adopt a methodology embedding domain knowledge in a unique “meta-wrapper”, not requiring training, with negligible maintenance costs and based on the combination of several extraction techniques. Such methods are applied at the structural level, at the character level and at the annotation level. We describe the methodology, illustrate our tool, compare with known approaches and measure the accuracy of our techniques with proper experiments. |
|---|---|
| DOI: | 10.1007/978-3-642-40501-3_31 |
Nájsť tento článok vo Web of Science