Abstract

Comparative Judgement (CJ) is a pairwise comparison evaluation method, typically conducted online. Multiple judges each compare the quality of a series of paired performances and, from their decisions, a rank order is constructed and scores calculated. Research across different educational contexts supports CJ’s reliability for evaluating written performances, permitting more precise scoring of scripts and for dimension-focused evaluation. However, scant insights are available about the basis of judges’ evaluations. This issue is important because argument-based approaches to validation (common in the field of language testing and adopted in this study) require evidence to support claims about how scores are appropriate for test purpose. Therefore, we investigate the scoring validity of CJ, both when used holistically (the standard application of CJ) and when evaluating scripts by individual criteria (termed dimensions in the research context). Twenty-seven judges evaluated 300 scripts addressing two writing task types in a national English as a Foreign Language examination for young learners in Austria. Judges reported via questionnaires what they had focused on while judging. Subsequently, eight judges provided think-aloud data while evaluating 157 scripts, offering further insight into the writing features they considered and their decision-making during CJ. Findings showed that while most judges adapted a decision-making process similar to traditional rating methods, some adapted their method to accommodate the nature of CJ evaluation. Furthermore, results indicated that the judges considered construct-relevant criteria when using CJ, both holistically and by dimension, thus offering support to an argument for the appropriateness of using CJ in this context. • Comparative Judgement can offer an alternative to analytic rating of EFL writing. • Judges with teaching or rating experience largely focus on relevant text features. • Some judges adopt a decision-making process that appears well suited to CJ. • Dimension-based CJ has the potential to provide richer feedback than holistic CJ.

Journal
Assessing Writing
Published
2025-10-01
DOI
10.1016/j.asw.2025.100986
CompPile
Search in CompPile ↗
Open Access
OA PDF Hybrid
Topics
Export

Citation Context

Cited by in this index (0)

No articles in this index cite this work.

References (48) · 2 in this index

  1. Think-aloud protocols in research on essay rating: an empirical study of their veridicali…
    Language Testing  
  2. A systematized review of research with adaptive comparative judgment (ACJ) in higher education
    International Journal of Technology and Design Education  
  3. Editorial – the CJ landscape
    Research Matters
  4. Marking as judgment
    Research Papers in Education  
  5. A review of the valid methodological use of adaptive comparative judgment in technology e…
    Frontiers in Education  
Show all 48 →
  1. Exploring the validity of comparative judgement: do judges attend to construct-irrelevant…
    Frontiers in Education  
  2. Argument-based validation in testing and assessment
  3. Improving awarding: 2018/2019 pilots
  4. Second language teacher education
    The Cambridge guide to teaching English to speakers of other languages
  5. Testing writing for the E8 Standards
  6. The complexity of comparative judgments in assessing argumentative writing: an eye tracki…
    Frontiers in Education  
  7. Verbal protocol analysis in language testing research. A handbook
  8. To code or not to code: dilemmas in analysing think-aloud protocols in learning strategie…
    System  
  9. Marking reliability studies 2017: Rank ordering versus marking – which is more reliable? Ofqual
  10. Ask, answer, assess: peer learning from student-generated content
  11. Comparative judgement in education research
    International Journal of Research & Method in Education  
  12. The problem of assessing problem solving: can comparative judgement help?
    Educational Studies in Mathematics  
  13. Critiquing the rationales for using comparative judgement: a call for clarity
    Assessment in Education: Principles, Policy & Practice
  14. Validation of rating processes within an argument-based framework
    Language Testing  
  15. Human judgment: the eye of the beholder
    Thomson Learning
  16. The measurement of observer agreement for categorical data
    Biometrics  
  17. Assessing the quality of argumentative texts: examining the general agreement between dif…
    Frontiers in Education  
  18. Validity of comparative judgment scores: how assessors evaluate aspects of text quality w…
    Frontiers in Education  
  19. When teachers compare argumentative texts: decisions informed by multiple complex aspects…
    L1 Educational Studies in Language and Literature  
  20. Assessment by comparative judgement: an application to secondary statistics and English i…
    New Zealand Journal of Educational Studies  
  21. Applying a Thurstonian, two-stage method in the standardized assessment of writing
    Applied Measurement in Education  
  22. Foreword
    Research Matters
  23. Crowdsourced adaptive comparative judgment: a community-based solution for proficiency rating
    Language Learning  
  24. The role of error in assessing English writing in the Austrian Educational Standards Base…
    Language testing in Austria: Taking stock = Sprachtesten in Österreich: Eine Bestandsaufnahme
  25. Comparative judgement for assessment
    International Journal of Technology and Design Education  
  26. The method of adaptive comparative judgement
    Assessment in Education: Principles, Policy & Practice
  27. Using adaptive comparative judgement for assessing GCSE History NEA responses: research report
    Qualifications Wales
  28. Examining writing: research and practice in assessing second language writing
  29. An exploration of comparative judgement for evaluating writing performances of the Austri…
    [Doctoral dissertation, Lancaster University]
  30. Comparative judgement for evaluating young learners’ EFL writing performances: reliabilit…
    Language Testing  
  31. Testing writing for the E8 Standards – Technical Report
  32. Rubric rating with MFRM versus randomly distributed comparative judgment: a comparison of…
    Educational Measurement, Issues and Practice  
  33. Evaluating comparative judgment as an approach to essay scoring
    Applied Measurement in Education  
  34. A law of comparative judgment
    Psychological Review  
  35. Assessing Writing
  36. Assessing Writing
  37. Comparative judgement for advancing research in applied linguistics
    Research Methods in Applied Linguistics  
  38. Making a choice is not easy?! Unravelling the task difficulty of comparative judgement to…
    University of Antwerp
  39. Validity of comparative judgement to assess academic writing: examining implications of i…
    Assessment in Education: Principles, Policy & Practice
  40. The complexity of assessing student work using comparative judgment: the moderating role …
    Frontiers in Education  
  41. Judges’ views on pairwise comparative judgement and rank ordering as alternatives to anal…
    Research Matters
  42. A comparative judgement approach to the large-scale assessment of primary writing in England
    Assessment in Education: Principles, Policy & Practice
  43. Testing the validity of judgements about geography essays using the adaptive comparative …
    Centre for Education Research and Policy