Assessing Writing

1016 articles
Year: Topic:
Export:

October 2025

  1. Improving writing feedback quality and self-efficacy of pre-service teachers in Gen-AI contexts: An experimental mixed-method design
    doi:10.1016/j.asw.2025.100960
  2. Lexical richness in young English learners’ writing: A focus on opinion and listen-write task types
    doi:10.1016/j.asw.2025.100975
  3. Linguistic predictors of L2 writing performance: Variations across genres
    doi:10.1016/j.asw.2025.100985
  4. Assessing writing practices in higher education: Characterizing self-reported practices and identifying their determinants
    doi:10.1016/j.asw.2025.100976
  5. Impact of task repetition schedules and emotions on L2 writing performance profiles using latent transition analysis
    doi:10.1016/j.asw.2025.100974
  6. Can generative AI figure out figurative language? The influence of idioms on essay scoring by ChatGPT, Gemini, and Deepseek
    doi:10.1016/j.asw.2025.100981
  7. Understanding the critical thinking experiences of L2 student writers engaged in linguistically supported peer feedback giving
    doi:10.1016/j.asw.2025.100977
  8. Response time for English learners on large-scale writing assessments
    doi:10.1016/j.asw.2025.100979
  9. Comparing GPT-based approaches in automated writing evaluation
    doi:10.1016/j.asw.2025.100961
  10. Assessing L2 writing formality using syntactic complexity indices: A fuzzy evaluation approach
    doi:10.1016/j.asw.2025.100973
  11. Growth mindset and writing engagement: The roles of motivation regulation and engagement with teacher’s written corrective feedback
    doi:10.1016/j.asw.2025.100980
  12. Criterion validity evidence and alternate form reliability of curriculum-based measures of written expression for eighth grade students
    doi:10.1016/j.asw.2025.100958
  13. Judgment accuracy in primary school EFL writing assessment: Do text characteristics matter?
    Abstract

    Assessing the writing competence of pupils learning English as a foreign language (EFL) at primary school is challenging. This study aimed at examining a largely unexplored topic, namely the role of text characteristics in writing assessment, and analysed judgment accuracy differentiated by nine aspects of text quality (communicative effect, level of detail, coherence, cohesion, complexity of syntax and grammar, correctness of syntax and grammar, vocabulary, orthography and punctuation). Two hundred pre-service teachers assessed four randomly assigned texts from learners in grade six. Their assessment was compared to the existing ratings of two experts from a previous study. We found a relative judgment accuracy between r = .34 and .60 for the nine assessment criteria, with vocabulary being assessed significantly more accurately than almost all other criteria. Orthography, complexity and correctness of syntax and grammar and punctuation were rated with significantly more accuracy than cohesion, level of detail, communicative effect and coherence. The pre-service teachers assessed most criteria more strictly and with higher variability than the experts. The results suggest that teacher education should offer pre-service teachers concrete opportunities to practise writing assessment, implement activities to strengthen the assessment of content- and structure-related criteria, and help them adjust their assessment rigour. • Judgment accuracy in the assessment of primary school EFL learners’ texts. • Relative judgment accuracy between r = .34 and .60 for the different criteria. • Significant differences in relative judgment accuracy between assessment criteria. • Linguistic text qualities are assessed with more accuracy than content- and structure-related aspects. • Pre-service teachers are more rigorous and heterogeneous in rating than experts.

    doi:10.1016/j.asw.2025.100957
  14. Exploring the scoring validity of holistic and dimension-based Comparative Judgements of young learners’ EFL writing
    Abstract

    Comparative Judgement (CJ) is a pairwise comparison evaluation method, typically conducted online. Multiple judges each compare the quality of a series of paired performances and, from their decisions, a rank order is constructed and scores calculated. Research across different educational contexts supports CJ’s reliability for evaluating written performances, permitting more precise scoring of scripts and for dimension-focused evaluation. However, scant insights are available about the basis of judges’ evaluations. This issue is important because argument-based approaches to validation (common in the field of language testing and adopted in this study) require evidence to support claims about how scores are appropriate for test purpose. Therefore, we investigate the scoring validity of CJ, both when used holistically (the standard application of CJ) and when evaluating scripts by individual criteria (termed dimensions in the research context). Twenty-seven judges evaluated 300 scripts addressing two writing task types in a national English as a Foreign Language examination for young learners in Austria. Judges reported via questionnaires what they had focused on while judging. Subsequently, eight judges provided think-aloud data while evaluating 157 scripts, offering further insight into the writing features they considered and their decision-making during CJ. Findings showed that while most judges adapted a decision-making process similar to traditional rating methods, some adapted their method to accommodate the nature of CJ evaluation. Furthermore, results indicated that the judges considered construct-relevant criteria when using CJ, both holistically and by dimension, thus offering support to an argument for the appropriateness of using CJ in this context. • Comparative Judgement can offer an alternative to analytic rating of EFL writing. • Judges with teaching or rating experience largely focus on relevant text features. • Some judges adopt a decision-making process that appears well suited to CJ. • Dimension-based CJ has the potential to provide richer feedback than holistic CJ.

    doi:10.1016/j.asw.2025.100986
  15. Editorial Board
    doi:10.1016/s1075-2935(25)00091-1
  16. Using ChatGPT to score essays and short-form constructed responses
    doi:10.1016/j.asw.2025.100988
  17. Which gender provides more specific peer feedback? Gender and assessment training’s effects on peer feedback specificity and intrapersonal factors
    Abstract

    This study investigated the effects of assessor gender (male vs. female), fictitious assessee gender (male vs. female), and assessment training (with vs. without) on peer feedback specificity (i.e. localisation and focus) and intrapersonal factors (i.e. trust in the self as an assessor and discomfort). This study involved 240 undergraduate psychology students (nMen=120, nWomen=120), with half receiving assessment training and the other half receiving the task instructions. Participants were divided into eight subgroups based on training condition and their self-reported gender to provide peer feedback to three writing samples (poor, average, excellent quality) by fictitious male or female peer assessees in Eduflow. A total of 3017 peer feedback segments were analysed, revealing that trained or untrained male and female assessors were comparable in most peer feedback specificity categories when assessing fictitious male or female assessees. Nonetheless, we also found that female assessors excelled in certain categories of peer feedback specificity, while male assessors also demonstrated competencies in other categories. Results also showed that assessors who received assessment training provided localised peer feedback in all the writing samples. Finally, gender and training did not affect participants’ trust in their abilities and (dis)comfort when providing peer feedback.

    doi:10.1016/j.asw.2025.100987
  18. Integrating move analysis and sentence reconstruction in automated writing evaluation for L2 academic writers
    doi:10.1016/j.asw.2025.100984
  19. The development of syntactic complexity in integrated writing: A focus on fine-grained measures
    doi:10.1016/j.asw.2025.100983
  20. Exploring the cross-lingual influence of linguistic complexity in second language writing assessment
    doi:10.1016/j.asw.2025.100951
  21. Challenges and opportunities of automated essay scoring for low-proficient L2 English writers
    doi:10.1016/j.asw.2025.100982
  22. GenAI and human assessments of L2 Chinese writing: Interrater reliability and rater bias
    doi:10.1016/j.asw.2025.100989
  23. Predictive validity evidence for a no-stakes, untimed, machine-scored diagnostic writing assessment
    doi:10.1016/j.asw.2025.100978

July 2025

  1. The relationship between executive functions, source use, and integrated writing performance
    doi:10.1016/j.asw.2025.100936
  2. Making things happen: A study of grammatical metaphors in L2 writing scripts
    Abstract

    The notion of grammatical metaphor (GM) (Halliday, 1985) is essentially where a writer can shift an action or quality into being a ‘thing’. As in most senses of metaphor, the goal is to “represent something as something else” (McGrath & Liardét, 2023, p.33). This study investigated the use of grammatical metaphor (GM) in Linguaskill writing exam responses across CEFR proficiency levels (below-B1 to C1 or above). It analysed the presence of a pre-existing GM list (see McGrath & Liardét, 2023) to explore GM frequency in L2 responses, the correlative relationship with proficiency scores and qualitatively explored candidate responses in terms of how GMs were used. Results show a moderate positive correlation between proficiency and GM use, with a dominance of process-to-thing shifts (e.g., transform→transformation) and emergence of GM use from lower to higher proficiency levels. This underscores GM's significance in crafting academically valued meanings in L2 contexts, suggesting its potential for informing instructional and assessment practices. • Metaphorisation in Writing is a useful metric for L2 writing assessment. • Evidence suggests GM frequency correlates with increased performance. • Learners progress from emergent arguments to presenting ideas more concisely. • The majority of GM shifts were to ‘things’. • The study provides further weight to arguments for meaning-based complexity.

    doi:10.1016/j.asw.2025.100939
  3. Promoting cognitive engagement with peer feedback through peer review training: The case of Chinese tertiary-level EFL learners
    doi:10.1016/j.asw.2025.100947
  4. Trinka: Facilitating academic writing through an intelligent writing evaluation system
    doi:10.1016/j.asw.2025.100953
  5. Editorial Board
    doi:10.1016/s1075-2935(25)00054-6
  6. Toward the fair and valid use of curriculum-based measurement for students with intensive writing needs and linguistically diverse backgrounds
    doi:10.1016/j.asw.2025.100948
  7. Using ChatGPT to facilitate vocabulary learning in continuation writing assessment tasks
    doi:10.1016/j.asw.2025.100952
  8. Comparative judgment in L2 writing assessment: Reliability and validity across crowdsourced, community-driven, and trained rater groups of judges
    doi:10.1016/j.asw.2025.100937
  9. Editorial Volume 65
    doi:10.1016/j.asw.2025.100963
  10. Editorial introduction, Assessing writing Tools & Tech Forum 2025
    doi:10.1016/j.asw.2025.100956
  11. A large-scale corpus for assessing source-based writing quality: ASAP 2.0
    Abstract

    This paper introduces ASAP 2.0, a dataset of ∼25,000 source-based argumentative essays from U.S. secondary students. The corpus addresses the shortcomings of the original ASAP corpus by including demographic data, consistent scoring rubrics, and source texts. ASAP 2.0 aims to support the development of unbiased, sophisticated Automatic Essay Scoring (AES) systems that can foster improved educational practices by providing summative to students. The corpus is designed for broad accessibility with the hope of facilitating research into writing quality and AES system biases. • We introduce the ASAP 2.0 corpus. • The corpus contains over 25,000 source-based essays. • Each essay is scored for overall writing quality. • The corpus can be used to computationally and quantitatively model source-based writing quality.

    doi:10.1016/j.asw.2025.100954
  12. The impact of self-revision, machine translation, and ChatGPT on L2 writing: Raters’ assessments, linguistic complexity, and error correction
    doi:10.1016/j.asw.2025.100950
  13. Potentials and pitfalls of Google Gemini in writing: Implications for educators
    doi:10.1016/j.asw.2025.100955
  14. Unveiling the precursors of negative emotions in second language writing through control-value theory: An explanatory sequential design approach
    doi:10.1016/j.asw.2025.100949
  15. A mixed-methods approach to English-L1 teachers’ implementation of written feedback in EFL classrooms
    doi:10.1016/j.asw.2025.100935

April 2025

  1. Modeling the interplay between teacher support, anxiety and grit in predicting feedback-seeking behavior in L2 writing
    doi:10.1016/j.asw.2025.100920
  2. Towards a better understanding of integrated writing performance: The influence of literacy strategy use and independent language skills
    Abstract

    This study explores the influence mechanism of literacy strategy use and independent language skills (e.g., reading and writing) on integrated writing (IW) performance. 322 Secondary Four students from four schools in Hong Kong completed single-text reading, multiple-text reading, independent writing, and IW tasks, along with questionnaires investigating their reading strategy use and IW strategy use. Path analyses revealed that multiple-text reading and independent writing had comparable significant impacts on IW, mediating the influence of single-text comprehension. In addition, reading strategy use impacted IW indirectly through independent literacy skills and IW strategy use, while IW strategies exerted a direct influence on IW. Our findings underscore the critical role of language skills in mediating the influence of reading strategies on IW performance among young first language (L1) learners. The implications for research and practice, are discussed, emphasizing the complexity of the IW construct and the need for balanced language skills and strategy instruction to enhance IW task performance. • A noble exploration of concurrent effects of strategies and independent skills on IW. • Multiple-text reading and independent writing directly influence IW performance. • Independent skills mediate the impact of reading strategies on IW performance. • Reading strategy indirectly affect IW through independent skills and IW strategy. • Balanced language skills and strategy instruction are crucial for IW performance.

    doi:10.1016/j.asw.2025.100922
  3. The influence of working memory and proficiency on phraseological growth: A longitudinal study of adjective-noun combinations in Chinese EFL learners’ argumentative writing
    doi:10.1016/j.asw.2025.100915
  4. Predicting inappropriate source use from scores of language use, source comprehension, and organizational features: A study using generalized linear models
    doi:10.1016/j.asw.2025.100934
  5. Assessing academic language in tenth grade essays using natural language processing
    doi:10.1016/j.asw.2025.100921
  6. Designing a rating scale for an integrated reading-writing test: A needs-oriented approach
    Abstract

    To meet the current trends in higher education, there is accountability on EAP programmes to prepare and assess students’ access to higher education. Thus, multimodal tasks including integrated writing (IW) assessments have seen a resurgence because they arguably closely mirror academic writing. However, test practicality constraints and variability in the use and format of these assessments mean rating scales often fall short in substantiating the central claims of IW assessment. We developed an integrated reading-writing scale taking into account reading-writing requirements and empirical research on IW tests designed to assess readiness for first-year humanities and social science courses. We approached test development as part of the ongoing validation efforts, detailing the considerations involved in the scale development process. We argue that alignment with academic writing requirements should guide the development of IW tests, thereby acknowledging and comprehending nuances of academic writing. The paper demonstrates considerations and decisions in scale design as the validation process from the start, which is a reminder that assessment is not just a quantitative exercise but a multifaceted process. • The design of a rating scale for first-year undergraduate academic writing is detailed. • Emphasis is placed on the role of reading in integrated writing scales. • Academic argumentation, rather than solely source-use mechanics, is considered. • Implications for construct operationalisation in academic evaluations are offered.

    doi:10.1016/j.asw.2025.100918
  7. Validation of the individual and collective self-efficacy scale for teaching writing in post-secondary faculty
    Abstract

    Faculty actions in the classroom are known to impact student writing self-efficacy and academic achievement. The purpose of this paper was to validate Locke and Johnston’s Individual and Collective Self-Efficacy for Teaching Writing Scales, a tool originally validated in high school teachers, in a new population of post-secondary faculty. Exploratory and confirmatory factor analysis methods were used in two studies with independent samples of multidisciplinary faculty (N = 281) for the exploratory factor analysis (Study 1) and nursing discipline specific faculty (N = 187) for the confirmatory factor analysis (Study 2). Three factors were identified in the questionnaire which maintained the essence of the theoretical structure proposed by Locke and Johnston. Factor 1 was named Context and Process Competencies, Factor 2 Textural Competencies, and Factor 3 Motivational Competencies. This factor structure was confirmed with acceptable goodness of fit in the confirmatory factor analysis Study 2. Learning to be a teacher of writing is a developmental process and this measurement tool has important validation information that speaks to its usefulness in understanding that process. • Instructional practices are known to impact student achievement levels. • Faculty individual self-efficacy for teaching writing is three factors. • Faculty undergo a slow enculturation practice to teaching writing. • This scale can be used to assess impact of teacher agency on student outcomes.

    doi:10.1016/j.asw.2025.100923
  8. Editorial
    doi:10.1016/j.asw.2025.100938
  9. Editorial Board
    doi:10.1016/s1075-2935(25)00030-3
  10. How L2 student writers engage with automated feedback: A longitudinal perspective
    doi:10.1016/j.asw.2025.100919
  11. Does student assessment literacy matter between motivational constructs and engagement in L2 writing? A survey of Chinese EFL undergraduates
    doi:10.1016/j.asw.2025.100916

January 2025

  1. Editorial Board
    doi:10.1016/s1075-2935(25)00015-7