Assessing Writing

14 articles
Year: Topic: Clear
Export:
race and writing ×

April 2026

  1. Assessing fairness in finetuned scoring models with demographically restricted training data
    Abstract

    The increasing adoption of automated essay scoring (AES) in high-stakes educational contexts necessitates careful examination of potential biases within the systems. This study investigates how the demographic composition of training data influences fairness in AES systems developed from finetuned large language models (LLMs). Using the PERSUADE corpus of 26,000 student essays, we conducted a systematic analysis using demographically restricted training sets to isolate the impact of training data demographics on LLM-AES performance. Each demographically restricted training set comprised essays written by one racial/ethnic group. Four variants of a Longformer-based AES were developed: one trained on demographically balanced data and three trained on demographically restricted datasets. An initial analysis of the human ratings indicated that demographic factors significantly predict human essay scores (marginal R² = 0.125), a pattern that is paralleled in national writing assessment data. LLM-AES systems trained on demographically restricted data exhibited small systematic biases (marginal R² = 0.043). However, the LLM trained on balanced data showed minimal demographic bias, suggesting that representative training data can effectively prevent amplification of demographic disparities beyond those present in human ratings. These results highlight both the importance and limitations of training data diversity in achieving fair assessment outcomes. • 12.5% of variance in human essay ratings was explained by demographics. • We construct demographically restricted training sets to isolate bias. • Balanced training data minimized LLM-AES bias across demographic groups. • LLM-AES trained on demographically restricted data showed more bias.

    doi:10.1016/j.asw.2026.101032

January 2025

  1. Examining the use of academic vocabulary in first-year ESL undergraduates’ writing: A corpus-driven study in Hong Kong
    Abstract

    A good command of academic vocabulary is important for academic success in higher education. However, research has primarily focused on the receptive academic vocabulary knowledge of L2 learners while devoting relatively limited attention to their productive use of such vocabulary and its impact on writing quality. To address this gap, we analysed the problem-solution essays written by 168 first-year undergraduates in Hong Kong, focusing on the relationship between their use of academic words in the Academic Vocabulary List (AVL) and the overall quality of their writing. We also explored the relationship between the size of students’ receptive academic vocabulary and the frequency of its use in writing. Findings revealed that essays with high scores contained a greater density and diversity of academic vocabulary than low-scored essays, with greater frequency of words in the 1–500 and 501–1000 tiers of the AVL significantly predicting better writing quality. The essays also showed a significant relationship between the participants’ receptive academic vocabulary size and the diversity of academic words used in writing. However, no significant relationship was observed between receptive academic vocabulary size and the density of academic words used. We highlight the implications of these findings for EAP teaching and research. • Problem-solution essays written by undergraduates in Hong Kong were analysed. • Density and diversity of academic vocabulary (AV) predict L2 writing quality. • Learners’ receptive AV size significantly relates to AV diversity in their writing. • Only words from two tiers of the AVL significantly predicted writing scores. • A holistic and tiered approach to assessing AV use is important.

    doi:10.1016/j.asw.2024.100913

October 2024

  1. Effects of a genre and topic knowledge activation device on a standardized writing test performance
    Abstract

    The aim of this article was twofold: first, to introduce a design for a writing test intended for application in large-scale assessments of writing, and second, to experimentally examine the effects of employing a device for activating prior knowledge of topic and genre as a means of controlling construct-irrelevant variance and enhancing validity. An authentic, situated writing task was devised, offering students a communicative purpose and a defined audience. Two devices were utilized for the cognitive activation of topic and genre knowledge: an infographic and a genre model. The participants in this study were 162 fifth-grade students from Santiago de Chile, with 78 students assigned to the experimental condition (with activation device) and 84 students assigned to the control condition (without activation device). The results demonstrate that the odds of presenting good writing ability are higher for students who were part of the experimental group, even when controlling for text transcription ability, considered a predictor of writing. These findings hold implications for the development of large-scale tests of writing guided by principles of educational and social justice. • Genre and topic knowledge are forms of prior knowledge relevant to writing. • Higher odds for better writing in students exposed to prior knowledge activation. • Results support use of prior knowledge activation in standardized assessment.

    doi:10.1016/j.asw.2024.100898

July 2024

  1. Navigating innovation and equity in writing assessment
    doi:10.1016/j.asw.2024.100873

January 2023

  1. A multi-measure approach for lexical diversity in writing assessments: Considerations in measurement and timing
    doi:10.1016/j.asw.2022.100688

July 2022

  1. Diversity of Advanced Sentence Structures (DASS) in writing predicts argumentative writing quality and receptive academic language skills of fifth-to-eighth grade students
    doi:10.1016/j.asw.2022.100649
  2. Investigating whether a flemma count is a more distinctive measurement of lexical diversity
    doi:10.1016/j.asw.2022.100640

January 2022

  1. Appropriateness as an aspect of lexical richness: What do quantitative measures tell us about children's writing?
    Abstract

    Quantitative measures of vocabulary use have added much to our understanding of first and second language writing development. This paper argues for measures of register appropriateness as a useful addition to these tools. Developing an idea proposed by Durrant and Brenchley (2019), it explores what such measures can tell us about vocabulary development in the L1 writing of school children in England and critically examines how results should be interpreted. It shows that significant patterns of discipline- and genre-specific vocabulary development can be identified for measures related to four distinct registers, though the strongest patterns are found for vocabulary associated with fiction and academic writing. Follow-up analyses showed that changes across year groups were primarily driven, not by the nature of individual words, but by the overall quantitative distribution of register-specific vocabulary, suggesting that the traditional distinction between measures of lexical diversity and lexical sophistication may not be helpful for understanding development in this context. Closer analysis of academic vocabulary showed development of distinct vocabularies in Science and English writing in response to sharply differing communicative needs in those disciplines, suggesting that development in children’s academic vocabulary should not be seen as a single coherent process.

    doi:10.1016/j.asw.2021.100596

July 2021

  1. Examining lexical features and academic vocabulary use in adolescent L2 students’ text-based analytical essays
    Abstract

    Having rich and complex vocabulary is a crucial component that contributes to the quality of writing for academic purposes. However, use of academic vocabulary can be challenging for adolescent L2 writers who are developing their academic language proficiency. Thus, understanding lexical needs of adolescent L2 students in composing academic essays is pivotal in supporting this population in their endeavor to become proficient academic writers. This study investigates the lexical features of adolescent L2 students’ text-based analytical essays and analyzes the extent to which lexical density, lexical diversity, and lexical sophistication predict the quality of their writing. Computational tools Coh-Metrix and VocabProfiler were used to obtain quantitative measures of lexical density, diversity, and sophistication. The results of the study indicate that the essays (n = 70), on average, have (1) low lexical density, (2) more repetition of words indicating less diversity compared to grade-level estimates, and (3) a higher percentage of basic words and lower percentage of academic words. 44 % of the AWL words in the essays come from the source text and prompt. The results of multiple hierarchical regression indicate that the use of academic vocabulary is a predictor of writing quality. The study has important pedagogical implications for classroom practice at secondary school.

    doi:10.1016/j.asw.2021.100540

January 2021

  1. Lexical density and diversity in dissertation abstracts: Revisiting English L1 vs. L2 text differences
    doi:10.1016/j.asw.2020.100511
  2. Investigating minimum text lengths for lexical diversity indices
    doi:10.1016/j.asw.2020.100505

October 2019

  1. Making our invisible racial agendas visible: Race talk in Assessing Writing, 1994–2018
    doi:10.1016/j.asw.2019.100425

January 2019

  1. The influence of lexical features on teacher judgements of ESL argumentative essays
    Abstract

    Numerous studies have examined the relationship between lexical features of students’ compositions and judgements of text quality. However, the degree to which teachers’ judgements are influenced by the quality of vocabulary in students’ essays with regard to their assessment of other textual characteristics is relatively unexplored. This experimental study investigates the influence of lexical features on teachers’ judgements of English as a second language (ESL) argumentative essays. Using analytic and holistic rating scales, English pre-service teachers (N = 37) in Switzerland assessed four essays of different proficiency levels in which the levels of lexical diversity and sophistication had been experimentally varied. Coh-Metrix software was used to manipulate the level of lexical diversity, as measured by MTLD and D, and the Tool for the Automatic Analysis of Lexical Sophistication (TAALES) software was used to obtain differing levels of lexical sophistication, as measured by word range. The results suggested that texts with greater lexical diversity and sophistication were assessed more positively concerning their overall quality as well as the analytic criteria ‘grammar’ and ‘frame of essay’. The implications of this study for classroom practice and teacher education are discussed.

    doi:10.1016/j.asw.2018.12.003

January 2012

  1. Linguistic discrimination in writing assessment: How raters react to African American “errors,” ESL errors, and standard English errors on a state-mandated writing exam
    doi:10.1016/j.asw.2011.10.001