Assessing Writing
57 articlesJanuary 2026
-
Generative artificial intelligence for automated essay scoring: Exploring teacher agency through an ecological perspective ↗
Abstract
Generative artificial intelligence (AI) is increasingly used in writing assessment, particularly for automated essay scoring (AES) and for generating formative feedback within automated writing evaluation (AWE). While AI-driven AES enhances efficiency and consistency, concerns regarding accuracy, bias, and ethical implications raise critical questions about its role in assessment. This paper examines the impact of generative AI on teacher agency through an ecological perspective, which considers agency as shaped by personal, institutional, and sociocultural factors. The analysis highlights the need for teachers to critically mediate AI-generated scores and feedback to align them with pedagogical goals, ensuring AI functions as an assistive tool rather than a determinant of assessment outcomes. Although AI can streamline assessment, over-reliance risks diminishing teachers’ evaluative expertise and reinforcing biases embedded in AI systems. Ethical concerns, including transparency, data privacy, and fairness, further complicate its adoption. To address these challenges, this paper proposes a framework for responsible AI integration that prioritizes bias mitigation, data security, and teacher-driven decision-making. The discussion concludes with pedagogical implications and directions for future research on AI-assisted writing assessment. • Teachers can actively mediate AI-generated scores to maintain agency. • Dependence on AES may weaken teachers’ evaluative skills. • Bias, data privacy, and AI opacity can undermine teachers’ decision-making. • AI literacy and hybrid assessment models can promote teacher autonomy. • A framework for protecting teacher agency in generative AI–based AWE is presented.
-
Unveiling the antecedents of feedback-seeking behavior in L2 writing: The impact of future L2 writing selves and emotions ↗
Abstract
While existing research on second or foreign (L2) feedback has predominantly focused on the effectiveness of various feedback practices and their impacts on writing performance, limited attention has been devoted to learners’ proactive role in seeking feedback, and how this important yet underexplored construct correlates with conative and affective variables remains insufficiently examined. To help fill that void, we sought to explore the concept of feedback-seeking behavior and its antecedents in L2 writing by examining the correlations with future L2 writing selves and emotions, particularly unpacking the mediating effect of emotions in the emotion-driven chain of “motivation→emotion→increased or decreased behavior” among 225 undergraduate English major students. Structural equation modeling unveiled that ideal and ought-to L2 writing selves directly and significantly influenced emotions, and emotions impacted the two dimensions of feedback-seeking behavior significantly. More importantly, ideal L2 writing self indirectly influenced feedback monitoring and feedback inquiry through the mediation of writing enjoyment. Nevertheless, writing boredom exercised no significant mediating effect on future L2 selves and feedback-seeking behavior. These findings reinforced the learner-centered perspective that positions students as proactive agents and provide some notable implications for L2 writing instruction to advance our understanding of teacher feedback. • Learners with heightened L2 selves deployed more feedback-seeking strategies. • Experiencing L2 enjoyment fostered distinct feedback-seeking behaviors. • No variations in L2 boredom existed in the link between L2 selves and behavior. • More high-quality research evaluating L2 learners as proactive agents is needed.
-
Abstract
Peer evaluation is widely recognized for its educational benefits; however, its reliability and validity, particularly among adolescent second-language (L2) writers at the early stages of English language and literacy development, remain insufficiently explored. This explanatory sequential mixed-methods study investigated the reliability and validity of peer evaluation in English argumentative writing among 35 Grade 10 and 37 Grade 12 students from a public high school in Beijing, China. Twelve of the participating students (six at each grade) were interviewed about the validity, reliability, and value of peer evaluation. The findings indicated that peer evaluations demonstrated high levels of reliability and validity, with peer-assessed writing scores closely aligning with inter-teacher assessments. Notably, variations were observed among Grade 10 students, particularly in the evaluation of lower-order writing skills, such as grammar and vocabulary, which exhibited reduced validity. These results underscore the potential of peer evaluation in assessing higher-order content-level writing across varying levels of L2 English writing proficiency. The study also highlights areas where adolescent L2 writers may require additional support to enhance the effectiveness of peer evaluation practices in English argumentative writing. Implications for improving English argumentative writing instruction and refining peer evaluation strategies in high school L2 English classrooms are discussed. • Peer evaluation shows high reliability, similar to inter-teacher rating. • Peer evaluation works well for higher-order skills in L2 argumentative writing. • 10th graders struggled with evaluating lower-order skills like grammar. • 12th graders evaluate lower- and higher-order skills with greater validity than 10th graders.
-
Assessing the effects of explicit coherence instruction on EFL students’ integrated writing performance ↗
Abstract
As a key attribute of effective writing, coherence remains challenging to teach in language classrooms, with traditional writing instruction frequently overlooking coherence in favor of discrete, rule-based features. This mixed-methods study investigates the effectiveness of explicit coherence instruction on English-as-a-Foreign-Language (EFL) students’ performance on integrated writing tasks. The study employed a controlled experimental design with 64 upper-intermediate-level undergraduate students at a Chinese university, drawing on Hasan’s Cohesive Harmony theory as the theoretical framework. Half of the participants (n = 32) in the experimental group received explicit instruction on coherence with a focus on cohesive chains and cohesive devices in integrated writing, while the control group (n = 32) received standard paraphrasing instruction. Quantitative analysis revealed that the experimental group showed significant improvements in coherence scores and multiple cohesive chain measures. Qualitative discourse analysis of six students’ writing samples from the experimental group demonstrated varying levels of improvement in writing coherence, with high-performing students showing better use of identity chains and pronoun references. The findings revealed that explicit instruction on coherence significantly improved students’ performance in creating coherent integrated writing, particularly through the development of cohesive chains and appropriate use of cohesive devices. This study underscores the pedagogical value of teaching coherence to enhance writing quality and provides concrete strategies for developing more effective teaching approaches for integrated writing tasks in EFL contexts. • The study examined 64 Chinese EFL students using mixed-methods experimental design. • Cohesive Harmony theory served as the framework for assessing writing coherence. • Explicit instruction significantly improved coherence in integrated writing tasks. • High-performing students demonstrated superior identity chain development.
October 2025
-
Abstract
Assessing the writing competence of pupils learning English as a foreign language (EFL) at primary school is challenging. This study aimed at examining a largely unexplored topic, namely the role of text characteristics in writing assessment, and analysed judgment accuracy differentiated by nine aspects of text quality (communicative effect, level of detail, coherence, cohesion, complexity of syntax and grammar, correctness of syntax and grammar, vocabulary, orthography and punctuation). Two hundred pre-service teachers assessed four randomly assigned texts from learners in grade six. Their assessment was compared to the existing ratings of two experts from a previous study. We found a relative judgment accuracy between r = .34 and .60 for the nine assessment criteria, with vocabulary being assessed significantly more accurately than almost all other criteria. Orthography, complexity and correctness of syntax and grammar and punctuation were rated with significantly more accuracy than cohesion, level of detail, communicative effect and coherence. The pre-service teachers assessed most criteria more strictly and with higher variability than the experts. The results suggest that teacher education should offer pre-service teachers concrete opportunities to practise writing assessment, implement activities to strengthen the assessment of content- and structure-related criteria, and help them adjust their assessment rigour. • Judgment accuracy in the assessment of primary school EFL learners’ texts. • Relative judgment accuracy between r = .34 and .60 for the different criteria. • Significant differences in relative judgment accuracy between assessment criteria. • Linguistic text qualities are assessed with more accuracy than content- and structure-related aspects. • Pre-service teachers are more rigorous and heterogeneous in rating than experts.
-
Exploring the scoring validity of holistic and dimension-based Comparative Judgements of young learners’ EFL writing ↗
Abstract
Comparative Judgement (CJ) is a pairwise comparison evaluation method, typically conducted online. Multiple judges each compare the quality of a series of paired performances and, from their decisions, a rank order is constructed and scores calculated. Research across different educational contexts supports CJ’s reliability for evaluating written performances, permitting more precise scoring of scripts and for dimension-focused evaluation. However, scant insights are available about the basis of judges’ evaluations. This issue is important because argument-based approaches to validation (common in the field of language testing and adopted in this study) require evidence to support claims about how scores are appropriate for test purpose. Therefore, we investigate the scoring validity of CJ, both when used holistically (the standard application of CJ) and when evaluating scripts by individual criteria (termed dimensions in the research context). Twenty-seven judges evaluated 300 scripts addressing two writing task types in a national English as a Foreign Language examination for young learners in Austria. Judges reported via questionnaires what they had focused on while judging. Subsequently, eight judges provided think-aloud data while evaluating 157 scripts, offering further insight into the writing features they considered and their decision-making during CJ. Findings showed that while most judges adapted a decision-making process similar to traditional rating methods, some adapted their method to accommodate the nature of CJ evaluation. Furthermore, results indicated that the judges considered construct-relevant criteria when using CJ, both holistically and by dimension, thus offering support to an argument for the appropriateness of using CJ in this context. • Comparative Judgement can offer an alternative to analytic rating of EFL writing. • Judges with teaching or rating experience largely focus on relevant text features. • Some judges adopt a decision-making process that appears well suited to CJ. • Dimension-based CJ has the potential to provide richer feedback than holistic CJ.
April 2025
-
Validation of the individual and collective self-efficacy scale for teaching writing in post-secondary faculty ↗
Abstract
Faculty actions in the classroom are known to impact student writing self-efficacy and academic achievement. The purpose of this paper was to validate Locke and Johnston’s Individual and Collective Self-Efficacy for Teaching Writing Scales, a tool originally validated in high school teachers, in a new population of post-secondary faculty. Exploratory and confirmatory factor analysis methods were used in two studies with independent samples of multidisciplinary faculty (N = 281) for the exploratory factor analysis (Study 1) and nursing discipline specific faculty (N = 187) for the confirmatory factor analysis (Study 2). Three factors were identified in the questionnaire which maintained the essence of the theoretical structure proposed by Locke and Johnston. Factor 1 was named Context and Process Competencies, Factor 2 Textural Competencies, and Factor 3 Motivational Competencies. This factor structure was confirmed with acceptable goodness of fit in the confirmatory factor analysis Study 2. Learning to be a teacher of writing is a developmental process and this measurement tool has important validation information that speaks to its usefulness in understanding that process. • Instructional practices are known to impact student achievement levels. • Faculty individual self-efficacy for teaching writing is three factors. • Faculty undergo a slow enculturation practice to teaching writing. • This scale can be used to assess impact of teacher agency on student outcomes.
January 2025
-
Examining the use of academic vocabulary in first-year ESL undergraduates’ writing: A corpus-driven study in Hong Kong ↗
Abstract
A good command of academic vocabulary is important for academic success in higher education. However, research has primarily focused on the receptive academic vocabulary knowledge of L2 learners while devoting relatively limited attention to their productive use of such vocabulary and its impact on writing quality. To address this gap, we analysed the problem-solution essays written by 168 first-year undergraduates in Hong Kong, focusing on the relationship between their use of academic words in the Academic Vocabulary List (AVL) and the overall quality of their writing. We also explored the relationship between the size of students’ receptive academic vocabulary and the frequency of its use in writing. Findings revealed that essays with high scores contained a greater density and diversity of academic vocabulary than low-scored essays, with greater frequency of words in the 1–500 and 501–1000 tiers of the AVL significantly predicting better writing quality. The essays also showed a significant relationship between the participants’ receptive academic vocabulary size and the diversity of academic words used in writing. However, no significant relationship was observed between receptive academic vocabulary size and the density of academic words used. We highlight the implications of these findings for EAP teaching and research. • Problem-solution essays written by undergraduates in Hong Kong were analysed. • Density and diversity of academic vocabulary (AV) predict L2 writing quality. • Learners’ receptive AV size significantly relates to AV diversity in their writing. • Only words from two tiers of the AVL significantly predicted writing scores. • A holistic and tiered approach to assessing AV use is important.
July 2024
-
Beyond accuracy gains: Investigating the impact of individual and collaborative feedback processing on L2 writing development ↗
Abstract
Despite the burgeoning research on exploring learner engagement with feedback, how second language (L2) learners’ engagement with feedback in different processing conditions influences their subsequent writing development is under-explored. This study examines the effects of individual and collaborative processing (languaging) of teacher feedback on Chinese lower-secondary school EFL learners’ writing development. Eighty-one students aged 13–14 with A1-A2 levels of English proficiency (according to the Common European Framework of Reference) from two classes and two experienced English teachers participated in the study. Students were provided with comprehensive teacher feedback and were asked to process feedback provided on three writing tasks through either individual written or collaborative oral languaging over six weeks. Pre-, post-, and delayed post-tests were administered. Students’ writing development was analysed using complexity, accuracy, and fluency measures, as well as content and organisation writing scores. Findings showed that the two conditions did not influence students’ writing complexity and fluency differently, while only the collaborative oral languaging condition contributed to students’ sustainable accuracy gains. Results based on the analytic writing scores suggested that students in the two conditions significantly improved content and organisation scores over time. Pedagogical and research implications regarding implementing the two feedback processing conditions are discussed.
April 2024
-
Abstract
Research into the contribution of multimodality to language learning is gaining momentum. While most studies pave the way for new understandings of language teaching and learning, there is an increasing demand for comprehensive assessment practices, particularly within higher education contexts. A few studies have emphasized the importance of reflecting on and establishing criteria for the assessment of multimodal literacy. This is necessary to understand students’ contributions in detail and to provide them with effective support in developing their multimodal skills. This study discusses the assessment of multimodal writing in English for Specific Purposes (ESP) contexts. It presents the design of an analytical tool for assessing multimodal texts and provided an example of its application. This tool covers assessment categories such as language use, content expression, interpersonal meaning, multimodality, and creativity and originality. As an example, we focus on the multimodal writing of a video game narrative, a genre that requires the integration of multiple modes of communication to convey meaning more effectively. Finally, this study offers pedagogical insights into the assessment of multimodal literacy in ESP.
October 2023
-
Abstract
Research on corrective feedback (CF) has developed from its original focus on identifying which type of CF is most effective for developing L2 language learners’ grammatical accuracy to focusing on how learners use CF. Underpinning this is the assumption that learners know what to do with CF when they receive it. The concept of “feedback literacy” challenges this assumption. Carless and Boud (2018), define feedback literacy as “the understandings, capacities and dispositions needed to make sense of information and use it to enhance work or learning strategies” (p. 1316). Our intention in this paper is to reflect on the manner in which theoretical and empirical work on feedback literacy can contribute to advancing L2 written corrective feedback (WCF) research agendas. Central in our proposal is the partially under-researched aspect of experience in terms of the L2 writers’ educational background experience, particularly experience with L1 and L2 writing. We further argue that how learners were taught L1 writing and how the L1 educational culture/ society values writing can impact on how learners approach L2 writing tasks and accompanying feedback. Implications of this inclusive view of the learner for future research and pedagogy is discussed.
July 2023
-
Abstract
Peerceptiv is a peer assessment tool developed by learning sciences researchers to help students demonstrate disciplinary knowledge through writing feedback practices. This review of Peerceptiv describes its key features while comparing it with other writing feedback tools and suggesting possibilities and limitations of using it to support AI-based online writing assessment across the disciplines. Future considerations regarding the use of Peerceptiv in assessing, teaching, and researching online writing are discussed.
April 2023
January 2023
July 2022
April 2022
-
The mediating effects of student beliefs on engagement with written feedback in preparation for high-stakes English writing assessment ↗
Abstract
Research in L2 writing contexts has shown developing writers’ beliefs exert a powerful mediating effect on how they respond to written feedback. The mediating role of beliefs is magnified in preparation for high-stakes English writing assessment contexts, where tangible outcomes pivot on successful test performance. The present qualitative case study utilises data from semi-structured interviews to investigate how the beliefs of three self-directed IELTS preparation candidates mediated their affective, behavioural, and cognitive engagement with electronic teacher written feedback across three multi-draft Task 2 rehearsal essays. Utilising a metacognitive conceptual approach (Wenden, 1998), the study identified seven themes: 1) self-concept beliefs regulated engagement, 2) reliance on the expertise of a quality teacher, 3) engagement was mediated by individuals’ learning-to-write beliefs, 4) belief in comprehensive, critical written feedback, 5) feedback deemed transferable was more comprehensively engaged with, 6) entrenched test-taking strategy beliefs hindered engagement, and 7) supplementary self-directed learning activities were considered of limited value. The implications for practitioners of IELTS Writing preparation and the IELTS co-owners are discussed.
October 2021
July 2021
April 2021
January 2021
July 2020
January 2020
April 2019
January 2019
-
Abstract
Numerous studies have examined the relationship between lexical features of students’ compositions and judgements of text quality. However, the degree to which teachers’ judgements are influenced by the quality of vocabulary in students’ essays with regard to their assessment of other textual characteristics is relatively unexplored. This experimental study investigates the influence of lexical features on teachers’ judgements of English as a second language (ESL) argumentative essays. Using analytic and holistic rating scales, English pre-service teachers (N = 37) in Switzerland assessed four essays of different proficiency levels in which the levels of lexical diversity and sophistication had been experimentally varied. Coh-Metrix software was used to manipulate the level of lexical diversity, as measured by MTLD and D, and the Tool for the Automatic Analysis of Lexical Sophistication (TAALES) software was used to obtain differing levels of lexical sophistication, as measured by word range. The results suggested that texts with greater lexical diversity and sophistication were assessed more positively concerning their overall quality as well as the analytic criteria ‘grammar’ and ‘frame of essay’. The implications of this study for classroom practice and teacher education are discussed.
October 2018
-
Abstract
The British Academic Written English (BAWE) corpus (www.coventry.ac.uk/BAWE) comprises almost 3,000 pieces of university student writing distributed across four domains (Arts & Humanities, Life Sciences, Social Sciences, Physical Sciences) and four levels of study (from first year undergraduate to taught Master's level). The texts had all been submitted as part of regular university coursework, and had been awarded top grades, indicating that they had met disciplinary requirements in terms of level and task. The corpus was compiled to enable identification of the linguistic and generic features associated with successful university student writing. Our detailed analyses of the corpus led to the identification of thirteen genre families, and supports the premises that university students write in a wider variety of genres than is commonly recognised, and that student writing differs across genres, disciplines and levels of university study. This review introduces the BAWE corpus and the associated genre family classification, then explains how they can be accessed and used for teaching and research purposes, how they have been used to deepen our understanding of academic writing in English, and where