Overview of the author identification task at PAN 2014

Uloženo v:
Podrobná bibliografie
Název: Overview of the author identification task at PAN 2014
Autoři: Stamatatos E., Daelemans W., Verhoeven B., Potthast M., Stein B., Juola P., Sanchez-Perez M. A., Barron-Cedeno A.
Zdroj: CLEF 2014 Evaluation Labs and Workshop Working Notes Papers, Sheffield, UK, 2014
Informace o vydavateli: CEUR-WS, 2014.
Rok vydání: 2014
Témata: author identification, author profiling, forensic linguistics, Linguistics
Popis: The author identification task at PAN-2014 focuses on author verification. Similar to PAN-2013 we are given a set of documents by the same author along with exactly one document of questioned authorship, and the task is to determine whether the known and the questioned documents are by the same author or not. In comparison to PAN-2013, a significantly larger corpus was built comprising hundreds of documents in four natural languages (Dutch, English, Greek, and Spanish) and four genres (essays, reviews, novels, opinion articles). In addition, more suitable performance measures are used focusing on the accuracy and the confidence of the predictions as well as the ability of the submitted methods to leave some problems unanswered in case there is great uncertainty. To this end, we adopt the c@1 measure, originally proposed for the question answering task. We received 13 software submissions that were evaluated in the TIRA framework. Analytical evaluation results are presented where one language-independent approach serves as a challenging baseline. Moreover, we continue the successful practice of the PAN labs to examine meta-models based on the combination of all submitted systems. Last but not least, we provide statistical significance tests to demonstrate the important differences between the submitted approaches.
Druh dokumentu: Conference object
Popis souboru: pdf; application/pdf
Jazyk: English
Přístupová URL adresa: https://repository.uantwerpen.be/docman/irua/9c2840/118298.pdf
https://hdl.handle.net/10067/1182980151162165141
https://hdl.handle.net/11585/709289
Přístupové číslo: edsair.dedup.wf.002..12f7cbb4bc5cf889253c4467c9b0240f
Databáze: OpenAIRE
Popis
Abstrakt:The author identification task at PAN-2014 focuses on author verification. Similar to PAN-2013 we are given a set of documents by the same author along with exactly one document of questioned authorship, and the task is to determine whether the known and the questioned documents are by the same author or not. In comparison to PAN-2013, a significantly larger corpus was built comprising hundreds of documents in four natural languages (Dutch, English, Greek, and Spanish) and four genres (essays, reviews, novels, opinion articles). In addition, more suitable performance measures are used focusing on the accuracy and the confidence of the predictions as well as the ability of the submitted methods to leave some problems unanswered in case there is great uncertainty. To this end, we adopt the c@1 measure, originally proposed for the question answering task. We received 13 software submissions that were evaluated in the TIRA framework. Analytical evaluation results are presented where one language-independent approach serves as a challenging baseline. Moreover, we continue the successful practice of the PAN labs to examine meta-models based on the combination of all submitted systems. Last but not least, we provide statistical significance tests to demonstrate the important differences between the submitted approaches.