Evaluation of Cohen's kappa and other measures of inter-rater agreement for genre analysis and other nominal data

Cohen's kappa (κ) is often recommended for nominal data as a measure of inter-rater (inter-coder) agreement or reliability. In this paper we ask which term is appropriate in genre analysis, what statistical measures are valid to measure it, and how much the choice of units affects the values ob...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of English for academic purposes Vol. 53; p. 101026
Main Authors:	Rau, Gerald, Shih, Yu-Shan
Format:	Journal Article
Language:	English
Published:	Amsterdam Elsevier Ltd 01.09.2021 Elsevier Science Ltd
Subjects:	Agreements Applied linguistics Coding Cohen's kappa Corpus analysis Data Genre Inappropriateness Inter-rater reliability Kappa coefficient Measures Move analysis Reliability Research methodology Statistical test Statistical test Inter-rater reliability Move analysis Coding Research methodology Cohen's kappa
ISSN:	1475-1585, 1878-1497
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Cohen's kappa (κ) is often recommended for nominal data as a measure of inter-rater (inter-coder) agreement or reliability. In this paper we ask which term is appropriate in genre analysis, what statistical measures are valid to measure it, and how much the choice of units affects the values obtained. We find that although both agreement and reliability may be of interest, only agreement can be measured with nominal data. Moreover, while kappa may be appropriate for macrostructure or corpus analysis, it is inappropriate for move or component analysis, due to the requirement of κ that the units be predetermined, fixed, and independent. κ further assumes that all disagreements in category assignment are equally likely, which may not be true. We also describe other measures, including correlation, chi square, and percent agreement, and demonstrate that despite its limitations, percent agreement is the only valid measure in many situations. Finally, we demonstrate why choice of unit has a large effect on the value calculated. These findings also apply to other studies in applied linguistics using nominal data. We conclude that the methodology used needs to be clearly explained to ensure that the requirements have been met, as in any other statistical testing. •For nominal data, only interrater agreement is valid; correlation is invalid.•Kappa may be suitable for macrostructure or corpus, but not move analysis.•Rater-determined move boundaries preclude predetermined, fixed coding units.•Neither sequential sentences nor semi-ordered categories are independent.•Details must be reported to ensure statistical testing requirements are met.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1475-1585 1878-1497
DOI:	10.1016/j.jeap.2021.101026