Comparative judgment in L2 writing assessment: Reliability and validity across crowdsourced, community-driven, and trained rater groups of judges

Peter Thwaites Center for Applied Linguistics ; Pauline Jadoulle Center for Applied Linguistics ; Magali Paquot Center for Applied Linguistics
Journal
Assessing Writing
Published
2025-07-01
DOI
10.1016/j.asw.2025.100937
CompPile
Search in CompPile ↗
Open Access
Closed
Topics
Export

Citation Context

Cited by in this index (1)

  1. Assessing Writing

References (52) · 1 in this index

  1. Crowdsourcing relative rankings of multi-word expressions: experts versus non-experts
  2. Rater cognition: implications for validity
    Educational Measurement: Issues and Practice  
  3. Is comparative judgement just a quick form of multiple marking?
    Research Matters: A Cambridge Assessment Publication
  4. Learning words with unfamiliar orthography: The role of cognitive abilities
    Studies in Second Language Acquisition
  5. Measuring conceptual understanding using comparative judgement
    International Journal of Research in Undergraduate Mathematics Education  
Show all 52 →
  1. The effect of adaptivity on the reliability coefficient in adaptive comparative judgement
    Assessment in Education: Principles, Policy Practice
  2. Exploring the validity of comparative judgement: Do judges attend to construct-irrelevant…
    Frontiers in Education  
  3. Statistical power analysis for the behavioral sciences
    Statistical Power Analysis for the Behavioral Sciences
  4. Relating language examinations to the Common European Framework of Reference for Languages: Learning, teaching, assessment (CEFR): A Manual
  5. Crowd-sourcing human ratings of linguistic production
    Proceedings of the Annual Meeting of the Cognitive Science Society
  6. Data quality in online human-subjects research: Comparisons between MTurk, Prolific, Clou…
    PLOS ONE  
  7. Protocol analysis: verbal reports as data
  8. Everitt, B.S., & Skrondal, A. (2010). The Cambridge dictionary of statistics. 〈http://196.43.179.6:8080/xmlui…
  9. Gravetter, F.J., & Wallnau, L.B. (2017). Statistics for the Behavioral Sciences (10th ed.). Cengage. 〈https:/…
  10. Exploring language assessment and testing: language in action
  11. A comparative judgment approach to assessing Chinese Sign Language interpreting
    Language Testing  
  12. Jones, I. (2022). Sirt functions [Computer software]. 〈https://github.com/NoMoreMarking/sirt/blob/main/R/sirt…
  13. Comparative judgement in education research
    International Journal of Research Method in Education
  14. The problem of assessing problem solving: can comparative judgement help?
    Educational Studies in Mathematics  
  15. Jones, I., & Inglis, M. (2023). The validity of comparative judgement: A comment on Kelly, Richardson and Isa…
  16. Fifty years of A-level mathematics: Have standards changed?
    British Educational Research Journal  
  17. Critiquing the rationales for using comparative judgement: A call for clarity
    Assessment in Education: Principles, Policy Practice
  18. The measurement of observer agreement for categorical data
    Biometrics  
  19. Assessing the quality of argumentative texts: examining the general agreement between dif…
    Frontiers in Education  
  20. Validity of comparative judgment scores: How assessors evaluate aspects of text quality w…
    Frontiers in Education  
  21. When teachers compare argumentative texts: Decisions informed by multiple complex aspects…
    L1-Educational Studies in Language and Literature  
  22. Validity
    In Educational measurement
  23. Crowdsourced adaptive comparative judgment: A community-based solution for proficiency rating
    Language Learning  
  24. Proficiency reporting practices in research on second language acquisition: Have we made …
    Language Learning  
  25. Data quality of platforms and panels for online behavioral research
    Behavior Research Methods  
  26. The classification accuracy and consistency of comparative judgement of writing compared …
    Research in Education  
  27. R Core Team. (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Co…
  28. Robitzsch, A. (2023). sirt: Supplementary Item Response Theory Models. 〈https://CRAN.R-project.org/package=sirt〉.
  29. Feasibility of using comparative judgement and student judges to assess writing performan…
    Journal of Pedagogical Research  
  30. Rubric rating with MFRM versus randomly distributed comparative judgment: A comparison of…
    Educational Measurement: Issues and Practice  
  31. Using two-alternative forced choice tasks and Thurstone’s law of comparative judgments fo…
    Linguistic Approaches to Bilingualism  
  32. Evaluating comparative judgment as an approach to essay scoring
    Applied Measurement in Education  
  33. Assessment of L2 proficiency in second language acquisition research
    Language Learning  
  34. Research synthesis and historiography
    Synthesizing Research on Language Learning and Teaching  
  35. A law of comparative judgment
    Psychological Review  
  36. Assessing Writing
  37. Comparative judgment for advancing research in applied linguistics
    Research Methods in Applied Linguistics  
  38. Thwaites, P., Kollias, C., & Paquot, M. (under review). Testing crowdsourcing as a means of recruitment for t…
  39. Thwaites, P., Vandeweerd, N., & Paquot, M. (2024). Crowdsourced comparative judgement for evaluating learner …
  40. Proficiency assessment standards in second language acquisition research: “Clozing” the Gap
    Studies in Second Language Acquisition  
  41. Validity of comparative judgement to assess academic writing: Examining implications of i…
    Assessment in Education: Principles, Policy Practice
  42. Using crowdsourced comparative judgement and rubric-based rating to grade texts in the IC…
    Learner Corpus Research 2024
  43. A meta-analysis on the reliability of comparative judgement
    Assessment in Education: Principles, Policy Practice
  44. Scale separation reliability: What does it mean in the context of comparative judgment?
    Applied Psychological Measurement  
  45. Moderation of Non-exam assessments: Islem- Comparative Judgement a practical alternative?
  46. A comparative judgement approach to the large-scale assessment of primary writing in England
    Assessment in Education: Principles, Policy Practice