Syntactic methods for topic-independent authorship attribution

The efficacy of syntactic features for topic-independent authorship attribution is evaluated, taking a feature set of frequencies of words and punctuation marks as baseline. The features are ‘deep’ in the sense that they are derived by parsing the subject texts, in contrast to ‘shallow’ syntactic fe...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Natural language engineering Ročník 23; číslo 5; s. 789 - 806
Hlavní autori: BJÖRKLUND, JOHANNA, ZECHNER, NIKLAS
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Cambridge, UK Cambridge University Press 01.09.2017
Predmet:
ISSN:1351-3249, 1469-8110, 1469-8110
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:The efficacy of syntactic features for topic-independent authorship attribution is evaluated, taking a feature set of frequencies of words and punctuation marks as baseline. The features are ‘deep’ in the sense that they are derived by parsing the subject texts, in contrast to ‘shallow’ syntactic features for which a part-of-speech analysis is enough. The experiments are made on two corpora of online texts and one corpus of novels written around the year 1900. The classification tasks include classical closed-world authorship attribution, identification of separate texts among the works of one author, and cross-topic authorship attribution. In the first tasks, the feature sets were fairly evenly matched, but for the last task, the syntax-based feature set outperformed the baseline feature set. These results suggest that, compared to lexical features, syntactic features are more robust to changes in topic.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1351-3249
1469-8110
1469-8110
DOI:10.1017/S1351324917000249