Abstract

Scoring is a fundamental step in the assessment of writing performance. The choice of the scoring procedure as well as the adoption of a discrepancy resolution method can impact the psychometric properties of the scores and therefore the final pass/fail decision. In a comprehensive framework which considers scoring as part of the validation process of the scores, the aim of this paper is to evaluate the impact of rater mean, parity and tertium quid procedures on score properties. Using data from a writing assessment task applied in a professional context, the paper analyses score reliability, dependability, unidimensionality and decision accuracy on two sets of data; complete data and subsample of discrepant data. The results show better performance of the tertium quid procedure in terms of reliability indicators but a lower quality in defining construct unidimensionality.

Journal
Assessing Writing
Published
2022-10-01
DOI
10.1016/j.asw.2022.100669
CompPile
Search in CompPile ↗
Open Access
OA PDF Hybrid
Topics
Export

Citation Context

Cited by in this index (0)

No articles in this index cite this work.

References (46) · 5 in this index

  1. American Educational Research Association, American Psychological Association, & National Council on Measurem…
  2. Language assessment in practice: Developing language assessments and justifying their use in the real world
  3. Investigating variability in tasks and rater judgments in a performance test of foreign l…
    Language Testing  
  4. Brennan, R.L. (1996). Generalizability of performance assessments. In G. W.Phillips (Ed.), Technical issues i…
  5. Generalizability Theory
Show all 46 →
  1. Evaluation of language tests through validation research
    The companion to language assessment
  2. Managing validity versus reliability trade-offs in scale-building decisions
    Psychological Methods  
  3. Council of Europe. (2001). Common European Framework of Reference for Languages: Learning, Teaching and Asses…
  4. Reliability and Validity: History, Notions, Methods, Discussion
    The ITC International Handbook of Testing and Assessment
  5. Determining the scoring validity of a co-constructed CEFR-based rating scale
    Language Testing  
  6. Assessing Writing
  7. Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments
  8. Evaluating structural equation models with unobservable variables and measurement errors
    Journal of Marketing Research  
  9. Essentials of Statistics for the Behavioral Sciences
    CA
  10. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria v…
    Structural Equation Modeling  
  11. Assessing Writing
  12. The relationship between score resolution methods and interrater reliability: An empirica…
    Applied Measurement in Education  
  13. Written Communication
  14. Score resolution: An investigation of the reliability and validity of resolved scores
    Applied Measurement in Education  
  15. An argument-based approach to validity
    Psychological Bulletin  
  16. Kane, M.T. (2006). Validation. In R. Brennen (Ed.), Educational measurement, 4th ed. (pp. 17–64). Westport, C…
  17. Validating the interpretations and uses of test scores
    Journal of Educational Measurement  
  18. Resolving discrepant ratings in writing assessments: The choice of resolution method and …
    English Teaching  
  19. Validation of rating processes with an argument-based framework
    Language Testing  
  20. Knoch, U., & Macqueen, S. (2020). Assessing English for professional purposes. Abingdon: Routledge.
  21. Classification consistency and accuracy for complex assessments using item response theory
    Journal of Educational Measurement  
  22. Lee, Y.W. (2005). Dependability of Scores for a New ESL Speaking Test: Evaluating Prototype Tasks. ETS Monogr…
  23. Working with sparse data in rated language tests: Generalizability theory applications
    Language Testing  
  24. Many-facet Rasch measurement
  25. Optimizing rating scale category effectiveness
    Journal of Applied Measurement
  26. Performance appraisal: Issues of validity
    Performance Improvement Quarterly  
  27. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: A…
  28. Myers, M. (1980). A procedure for writing assessment and holistic scoring. Urbana, IL: National Council of Te…
  29. Assessing Writing
  30. Generalized eta and omega squared statistics: measures of effect size for some common res…
    Psychological Methods  
  31. The accuracy of performance task scores after resolution of rater disagreement: A monte C…
    Assessing Writing
  32. lavaan: An R package for structural equation modeling
    Journal of Statistical Software  
  33. Rudner, L.M. (2001). Computing the expected proportions of misclassified examinees. Practical Assessment Rese…
  34. Rudner, L.M. (2005). Expected classification accuracy. Practical Assessment Research & Evaluation,10(13). Ret…
  35. A Comparison of consensus, consistency, and measurement approaches to estimating interrat…
    Pract. Assesm. Research, and Evaluation
  36. A practitioner’s guide to computation and interpretation of reliability indices for maste…
    Journal of Educational Measurement  
  37. Weir, C. (2005). Language testing and validation. New York: Palgrave Macmillan.
  38. Assessing Writing
  39. Exploring the impacts of different score resolution procedures on person fit and estimate…
    Language Assessment Quarterly  
  40. Wolcott, W. (1998). An overview of writing assessment: Theory, research, and practice. Urbana, IL: National C…
  41. Assessing the accuracy and consistency of language proficiency classification under compe…
    Language Testing