The impact of task duration on the scoring of independent writing responses of adult L2-English writers

Ben Naismith Advanced Cooling Technologies (United States) ; Yigal Attali Advanced Cooling Technologies (United States) ; Geoffrey T. LaFlair Advanced Cooling Technologies (United States)

Abstract

In writing assessment, there is inherently a tension between authenticity and practicality: tasks with longer durations may more closely reflect real-life writing processes but are less feasible to administer and score. What is more, given total testing time, there is necessarily a trade-off between task duration and number of tasks. Traditionally, high-stakes assessments have managed this trade-off by administering one or two writing tasks each test, allowing 20–40 minutes per task. However, research on second language (L2) English writing has not found longer task durations to significantly improve score validity or reliability. Importantly, very few studies have compared much shorter durations for writing tasks to more traditional allotments. To explore this issue, we asked adult L2-English test takers to respond to two writing prompts with either 5-minute or 20-minute time limits. Responses were then evaluated by expert human raters and an automated writing evaluation tool. Regardless of scoring method, short duration scores evidenced equally high test-retest reliability and criterion validity as long duration scores. As expected, longer task duration yielded higher scores, but regardless of duration, test takers demonstrated the entire spectrum of writing proficiency. Implications for writing assessment are discussed in relation to scoring practices and task design. • Longer writing tasks do not have higher test-retest reliability than shorter ones. • Longer writing tasks do not have higher criterion validity than shorter ones. • The impact of task duration is not mediated by scoring method (human or machine).

Journal
Assessing Writing
Published
2024-10-01
DOI
10.1016/j.asw.2024.100895
CompPile
Search in CompPile ↗
Open Access
OA PDF Hybrid
Topics
Export

Citation Context

Cited by in this index (0)

No articles in this index cite this work.

References (92) · 12 in this index

  1. Assessing Writing
  2. Test length and cognitive fatigue: An empirical examination of effects on performance and…
    Journal of Experimental Psychology: Applied
  3. Cognitive fatigue during testing: An examination of trait, time-on-task, and strategy inf…
    Human Performance  
  4. Standards for educational and psychological testing
  5. Learning to Write in Our Nation’s Schools: Instruction and Achievement in 1988 at Grades 4, 8, and 12 NAEP (AMP-90-3480, NAEP-19-W-02; p. 122)
Show all 92 →
  1. Attali, Y. (2012, April). Factor structure of the e-rater automated essay scoring system [Paper presentation]…
  2. Validity and reliability of automated essay scoring
    Handbook of automated essay evaluation
  3. A comparison of newly-trained and experienced raters on a standardized writing assessment
    Language Testing  
  4. Language assessment in practice: Developing language assessments and justifying their use in the real world
  5. Documenting features of written language production typical at different IELTS band score…
    IELTS Research Reports
  6. Think-aloud protocols in research on essay rating: An empirical study of their veridicali…
    Language Testing  
  7. Toward More Substantively Meaningful Automated Essay Scoring
    The Journal of Technology Learning and Assessment
  8. Exploring the relationship between TOEFL iBT scores and disciplinary writing performance
    TESOL Quarterly  
  9. Research in the Teaching of English
  10. Research in written composition
  11. Assessing Writing Skill. Research Monograph No. 11
  12. Writing assessment in admission to higher education: Review and framework
    ETS Research Report Series  
  13. Generalizability analyses of WorkKeys listening and writing tests
    Educational and Psychological Measurement  
  14. Relationship between testing time and testing outcomes
    Integrating Timing Considerations to Improve Testing Practices
  15. British Council. (2023). IELTS Practice Academic Writing test 2 - Task 2. In Take IELTS. 〈https://takeielts.b…
  16. A categorical instrument for scoring second language writing skills
    Language Learning  
  17. A theoretical assessment ecosystem for a digital-first assessment—The Duolingo English Test
    Duolingo Research Reports (DRR-22-01)
  18. The e-rater® automated essay scoring system
    Handbook of automated essay evaluation: Current applications and new directions
  19. Cambridge IELTS 7: Examination Papers from University of Cambridge ESOL Examinations: English for Speakers of Other Languages
  20. Cambridge IELTS 8: Examination Papers from University of Cambridge ESOL Examinations: English for Speakers of Other Languages
  21. Duolingo English Test: Technical Manual
    Duolingo Research Reports
  22. Designing and conducting mixed methods research
    SAGE Publications
  23. The validity of timed essay tests in the assessment of writing skills
    ELT Journal  
  24. Examining human and automated ratings of elementary students’ writing quality: A multivar…
    American Educational Research Journal  
  25. Assessing Writing
  26. Statistical power analysis for the behavioral sciences
  27. The assessment of writing ability: A review of research
    ETS Research Report Series  
  28. Writing performance assessments: How important is extended time?
    Journal of Learning Disabilities  
  29. Linguistic microfeatures to predict L2 writing proficiency: A case study in automated wri…
    Journal of Writing Assessment
  30. Assessing Writing
  31. Decision making while rating ESL/EFL writing tasks: A descriptive framework
    The Modern Language Journal  
  32. Comparison of IELTS Academic and Duolingo English Test
    IELTS Research Reports
  33. A comparison of two TOEFL® writing tasks
    ETS Research Memorandum Series (ETS RM–23-06)
  34. Correlation coefficients measured on the same individuals
    Journal of American Statistical Association  
  35. Assessing Writing
  36. Operational rater types in writing assessment: Linking rater cognition to rater behavior
    Language Assessment Quarterly  
  37. The official guide to the TOEFL iBT Test
  38. Complementing human judgment of essays written by English language learners with e-rater(…
    Language Testing  
  39. Statistical Methods for Rates and Proportions
  40. Score generalizability of writing tasks: Does one test method fit it all?
    Language Testing  
  41. Assessing Writing
  42. Teacher-examiners’ explicit and enacted beliefs about proficiency indicators in national …
    Teacher Involvement in High-Stakes Language Testing
  43. Assessing Writing
  44. Effects of amount of time allowed on the test of written English
    ETS Research Report Series  
  45. Second language writing: Assessment issues
    Second language writing
  46. Assessing Writing
  47. Task and rater effects in L2 speaking and writing: A synthesis of generalizability studies
    Language Testing
  48. A history of test speededness: Tracing the evolution of theory and practice
    Integrating Timing Considerations to Improve Testing Practices
  49. A sampling model for validity
    Applied Psychological Measurement  
  50. The impact of time limits and timing information on validity
    Integrating Timing Considerations to Improve Testing Practices
  51. Timed versus at-home assessment tests: Does time affect the quality of second language le…
    TESL-Ejemplo
  52. Journal of Writing Research
  53. The effects of time limits, item sequence, and question format on applicant performance on the California Bar Examination (No. 81-7)
  54. Validity and fairness implications of varying time conditions on a diagnostic test of aca…
    System  
  55. What does time buy? ESL student performance on home versus class compositions
    Second Language Writing (Cambridge Applied Linguistics): Research Insights for the Classroom
  56. The measurement of observer agreement for categorical data
    Biometrics  
  57. The Oxford Handbook of Qualitative Research
  58. Lee, J., & Son, M. (2023). The effects of varied time constraints on writing performance. 〈https://www.aaal.o…
  59. The effect of additional time on the quality of argumentation in L2 writing assessment: A…
    Language Assessment Quarterly  
  60. Dependability of new writing task scores: Evaluating prototype tasks and alternative rati…
    TOEFL Monograph Series, 31
  61. The effects of time limits on the quality of student-written essays
    Paper presented at the Annual Meeting of the American Educational Research Association
  62. Comparability of students’ writing performance on TOEFL iBT and in required university wr…
    Language Testing  
  63. Effects of response mode and time allotment on college students’ writing
    Journal of College Reading and Learning  
  64. Timing considerations for performance assessments
    Integrating Timing Considerations to Improve Testing Practices
  65. Comparing correlated correlation coefficients
    Psychological Bulletin  
  66. Authenticity in the IELTS Academic Module Writing Test: A comparative study of Task 2 ite…
    British Council/IDP Australia Research Reports
  67. The imminence of grading essays by computer
    Phi Delta Kappan
  68. Pearson. (2023). Pearson shortens PTE Academic test and launches online version Pearson PTE. Pearson PTE. 〈ht…
  69. If I only had more time:” ESL learners’ changes in linguistic accuracy on essay revisions
    Journal of Second Language Writing  
  70. Thematic analysis. In
    Understanding, Evaluating, and Conducting Second Language Writing Research
  71. ESL writing assessment prompts: How students choose
    Journal of Second Language Writing  
  72. Effects of applying different time limits to a proposed GRE writing test
    Journal of Educational Measurement  
  73. Assessing Writing
  74. Evaluation of e-rater® for the GRE® issue and argument prompts
    ETS Research Report Series  
  75. Investigating IELTS Academic Writing Task 2: Relationships between cognitive writing proc…
    IELTS Research Reports
  76. Generalizability of writing scores: An application of structural equation modeling
    Language Testing  
  77. Sampling variability of performance assessments
    Journal of Educational Measurement  
  78. The effects of time allocation on Korean college students’ performance of drafted and tim…
    English Teaching  
  79. Holistic assessment: What goes on in the rater’s mind?
    Assessing second language writing in academic contexts
  80. Assessing writing
  81. Validation of automated scores of TOEFL iBT tasks against non-test indicators of writing …
    Language Testing  
  82. College Composition and Communication
  83. The effect of timing on the quantity and quality of test-takers’ writing
    New Zealand Studies in Applied Linguistics
  84. Assessing Writing
  85. Speededness as a source of test bias for non-native English speakers on the college level academic skills test [PhD thesis]
  86. Investigating native and non-native English-speaking teacher raters’ judgements of oral p…
    Assessment in Education: Principles, Policy Practice