Evaluation of Cohen's kappa and other measures of inter-rater agreement for genre analysis and other nominal data

Cohen's kappa (κ) is often recommended for nominal data as a measure of inter-rater (inter-coder) agreement or reliability. In this paper we ask which term is appropriate in genre analysis, what statistical measures are valid to measure it, and how much the choice of units affects the values ob...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Journal of English for academic purposes Ročník 53; s. 101026
Hlavní autori:	Rau, Gerald, Shih, Yu-Shan
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Amsterdam Elsevier Ltd 01.09.2021 Elsevier Science Ltd
Predmet:	Agreements Applied linguistics Coding Cohen's kappa Corpus analysis Data Genre Inappropriateness Inter-rater reliability Kappa coefficient Measures Move analysis Reliability Research methodology Statistical test Statistical test Inter-rater reliability Move analysis Coding Research methodology Cohen's kappa
ISSN:	1475-1585, 1878-1497
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Cohen's kappa (κ) is often recommended for nominal data as a measure of inter-rater (inter-coder) agreement or reliability. In this paper we ask which term is appropriate in genre analysis, what statistical measures are valid to measure it, and how much the choice of units affects the values obtained. We find that although both agreement and reliability may be of interest, only agreement can be measured with nominal data. Moreover, while kappa may be appropriate for macrostructure or corpus analysis, it is inappropriate for move or component analysis, due to the requirement of κ that the units be predetermined, fixed, and independent. κ further assumes that all disagreements in category assignment are equally likely, which may not be true. We also describe other measures, including correlation, chi square, and percent agreement, and demonstrate that despite its limitations, percent agreement is the only valid measure in many situations. Finally, we demonstrate why choice of unit has a large effect on the value calculated. These findings also apply to other studies in applied linguistics using nominal data. We conclude that the methodology used needs to be clearly explained to ensure that the requirements have been met, as in any other statistical testing. •For nominal data, only interrater agreement is valid; correlation is invalid.•Kappa may be suitable for macrostructure or corpus, but not move analysis.•Rater-determined move boundaries preclude predetermined, fixed coding units.•Neither sequential sentences nor semi-ordered categories are independent.•Details must be reported to ensure statistical testing requirements are met.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1475-1585 1878-1497
DOI:	10.1016/j.jeap.2021.101026