Assessing Writing
149 articlesJuly 2026
April 2026
-
Pursuing fair writing assessment: Halo effects in primary school foreign language writing in grade six ↗
Abstract
Assessing the writing competence of pupils learning English as a foreign language (EFL) at primary school is associated with specific challenges because of learners’ limited language resources. This study investigates the extent to which characteristics of their texts trigger so-called halo effects. Halo effects are an assessment bias where the quality of one feature unintentionally influences the evaluation of other aspects. The study examines halo effects across nine aspects of text quality (communicative effect, level of detail, coherence, cohesion, complexity of syntax and grammar, correctness of syntax and grammar, vocabulary, orthography and punctuation), based on a random sample of narrative texts from a sixth-grade corpus. 200 pre-service teachers assessed four randomly assigned texts. Halo effects were calculated by comparison to expert ratings using multi-level regression analyses. Results show that orthography and vocabulary were the two main triggers of halo effects. Punctuation also triggered some halo effects, but to a smaller extent. The assessment of communicative effect, complexity and correctness of syntax and grammar was not determined by the corresponding text quality but dominated by other criteria. Results highlight the importance of being aware of halo effects when assessing young EFL learners’ texts and emphasise the need for suitable training measures. • Analysis of halo effects across nine aspects of text quality. • Random sample of narrative texts from a sixth-grade EFL corpus. • Orthography and vocabulary are the two main triggers of halo effects. • Punctuation also triggers halo effects but to a smaller extent. • Halo effects call for awareness and targeted training.
-
How do L2 writing subskills interact hierarchically? Insights from diagnostic classification models ↗
Abstract
This study examined the hierarchical structure among second/foreign language (L2) writing subskills using a Hierarchical Diagnostic Classification Model (HDCM). A pool of 500 essays composed by English as a Foreign Language (EFL) students was assessed by four experienced EFL teachers using the Empirically-derived Descriptor-based Diagnostic (EDD) checklist. Based on a literature review and the expertise of three content experts, several models were developed to reflect various hierarchical interactions among L2 writing subskills, including linear, divergent, convergent, independent, unstructured, mixed, and higher-order. The comparison of the models showed the presence of an unstructured interaction among L2 writing subskills, indicating that content is the foundational subskill for the mastery of vocabulary, grammar, organization, and mechanics. Higher mastery classes were also associated with higher educational levels, greater frequency of English use, and longer exposure to L2. Understanding the hierarchical relationships among L2 writing subskills can improve targeted instructional strategies and assessment practices. • A constrained version of existing DCMs is represented by hierarchical DCMs. • Models were developed to show hierarchical interactions among L2 writing subskills. • An unstructured interaction among L2 writing subskills was identified. • Higher mastery classes were associated with higher educational levels. • The classes were associated with greater English use and longer L2 exposure.
January 2026
-
The effects of online resource use on L2 learners’ computer-mediated writing processes and written products ↗
Abstract
While previous studies on online resource use in L2 writing have focused on the overall writing quality, limited attention has been paid to its effects on linguistic complexity and real-time writing processes. Addressing this gap, the present study explored how online resource use influences both the processes and products of L2 writing. Forty-nine intermediate L2 learners completed two computer-mediated argumentative writing tasks, either with or without the use of online resources. Writing behaviors were captured via keystroke logging and screen recording, and analyzed for search activity, fluency, pausing, and revision quantity. Cognitive processes were examined through stimulated recall interviews, and written products were evaluated for both quality and linguistic complexity. The results showed that participants spent an average of 14 % of task time using online resources, with considerable individual variation. Mixed-effects modeling revealed that resource use facilitated the production of more sophisticated words, with marginal influence on writing quality or syntactic complexity. Resource use was also associated with longer between-word pauses, fewer within-word pauses, and reduced revisions. These findings highlight the potential of online resource use to enhance the authenticity of L2 writing assessment tasks without compromising test validity, while encouraging the use of more advanced vocabulary in writing. • Learners spent 14 % of the total writing task time using online resources. • Online resource use had no significant impact on L2 writing quality. • Online resource use improved lexical sophistication, not syntactic complexity. • Online resource use reduced within-word pauses and aided spelling retrieval. • Online resource use led to fewer revisions but did not affect fluency.
-
Unveiling the antecedents of feedback-seeking behavior in L2 writing: The impact of future L2 writing selves and emotions ↗
Abstract
While existing research on second or foreign (L2) feedback has predominantly focused on the effectiveness of various feedback practices and their impacts on writing performance, limited attention has been devoted to learners’ proactive role in seeking feedback, and how this important yet underexplored construct correlates with conative and affective variables remains insufficiently examined. To help fill that void, we sought to explore the concept of feedback-seeking behavior and its antecedents in L2 writing by examining the correlations with future L2 writing selves and emotions, particularly unpacking the mediating effect of emotions in the emotion-driven chain of “motivation→emotion→increased or decreased behavior” among 225 undergraduate English major students. Structural equation modeling unveiled that ideal and ought-to L2 writing selves directly and significantly influenced emotions, and emotions impacted the two dimensions of feedback-seeking behavior significantly. More importantly, ideal L2 writing self indirectly influenced feedback monitoring and feedback inquiry through the mediation of writing enjoyment. Nevertheless, writing boredom exercised no significant mediating effect on future L2 selves and feedback-seeking behavior. These findings reinforced the learner-centered perspective that positions students as proactive agents and provide some notable implications for L2 writing instruction to advance our understanding of teacher feedback. • Learners with heightened L2 selves deployed more feedback-seeking strategies. • Experiencing L2 enjoyment fostered distinct feedback-seeking behaviors. • No variations in L2 boredom existed in the link between L2 selves and behavior. • More high-quality research evaluating L2 learners as proactive agents is needed.
-
The relation between linguistic accuracy and scoring of Swedish EFL students’ writing during a high-stakes exam ↗
Abstract
This paper examines the effect of linguistic accuracy (e.g., the lack of form, grammatical, and lexical errors) on scoring during the high-stakes national test of English in Swedish upper secondary school. Teachers are expected to score their own students’ texts with the help of assessment instructions containing benchmark texts (i.e., texts representing different score bands). The assessment instructions and the score bands provided to guide scoring are not explicit about how accuracy should influence scores. Two research questions were answered: As measured by ordinal regression, to what extent does linguistic accuracy predict rater scores? Do the texts scored by teachers reflect the graded example texts in terms of how linguistic accuracy predicts scores? The results revealed, amongst other things, that overall frequency of errors in texts significantly predicted scores as the model explained approximately 58 % of the variance in the outcome variable according to Nagelkerke’s pseudo R-squared. Accuracy also had a similar effect on scores in texts rated by teachers as in the benchmark texts. In relation to the findings, it was concluded that accuracy may have more of an impact on scores than constructs that are more explicit components of the score bands such as lexical complexity.
-
Abstract
Peer evaluation is widely recognized for its educational benefits; however, its reliability and validity, particularly among adolescent second-language (L2) writers at the early stages of English language and literacy development, remain insufficiently explored. This explanatory sequential mixed-methods study investigated the reliability and validity of peer evaluation in English argumentative writing among 35 Grade 10 and 37 Grade 12 students from a public high school in Beijing, China. Twelve of the participating students (six at each grade) were interviewed about the validity, reliability, and value of peer evaluation. The findings indicated that peer evaluations demonstrated high levels of reliability and validity, with peer-assessed writing scores closely aligning with inter-teacher assessments. Notably, variations were observed among Grade 10 students, particularly in the evaluation of lower-order writing skills, such as grammar and vocabulary, which exhibited reduced validity. These results underscore the potential of peer evaluation in assessing higher-order content-level writing across varying levels of L2 English writing proficiency. The study also highlights areas where adolescent L2 writers may require additional support to enhance the effectiveness of peer evaluation practices in English argumentative writing. Implications for improving English argumentative writing instruction and refining peer evaluation strategies in high school L2 English classrooms are discussed. • Peer evaluation shows high reliability, similar to inter-teacher rating. • Peer evaluation works well for higher-order skills in L2 argumentative writing. • 10th graders struggled with evaluating lower-order skills like grammar. • 12th graders evaluate lower- and higher-order skills with greater validity than 10th graders.
-
Abstract
The assessment of task-generated cognitive demands has been receiving increasing attention in task complexity research. However, scant attention has been paid to assessing cognitive demands when task complexity is manipulated along both resource-directing and resource-dispersing dimensions. To address this gap, the present study aimed to investigate the relative effects of reasoning demands and prior knowledge on cognitive demands in L2 writing. Eighty-eight EFL students completed two letter-writing tasks with varying reasoning demands under one of two conditions, that is, either with prior knowledge available or without prior knowledge available. Cognitive demands were assessed by the post-task questionnaire, the dual-task method and the open-ended questions. The results revealed that reasoning demands and prior knowledge were strong determinants of cognitive demands, which provided empirical evidence for Robinson’s Cognition Hypothesis. Moreover, the post-task questionnaire, the dual-task method and open-ended questions were found to assess distinct aspects of cognitive demands, which highlighted the importance of data triangulation in exploring task complexity effects. The study provides language teachers and assessors with implications for task design and implementation. • How reasoning demands and prior knowledge affect cognitive demands was underexplored. • Cognitive demands were assessed by both quantitative and qualitative methods. • Findings supported some assumptions underlying Robinson’s framework. • The independent measures assessed distinct aspects of cognitive demands.
-
Assessing the effects of explicit coherence instruction on EFL students’ integrated writing performance ↗
Abstract
As a key attribute of effective writing, coherence remains challenging to teach in language classrooms, with traditional writing instruction frequently overlooking coherence in favor of discrete, rule-based features. This mixed-methods study investigates the effectiveness of explicit coherence instruction on English-as-a-Foreign-Language (EFL) students’ performance on integrated writing tasks. The study employed a controlled experimental design with 64 upper-intermediate-level undergraduate students at a Chinese university, drawing on Hasan’s Cohesive Harmony theory as the theoretical framework. Half of the participants (n = 32) in the experimental group received explicit instruction on coherence with a focus on cohesive chains and cohesive devices in integrated writing, while the control group (n = 32) received standard paraphrasing instruction. Quantitative analysis revealed that the experimental group showed significant improvements in coherence scores and multiple cohesive chain measures. Qualitative discourse analysis of six students’ writing samples from the experimental group demonstrated varying levels of improvement in writing coherence, with high-performing students showing better use of identity chains and pronoun references. The findings revealed that explicit instruction on coherence significantly improved students’ performance in creating coherent integrated writing, particularly through the development of cohesive chains and appropriate use of cohesive devices. This study underscores the pedagogical value of teaching coherence to enhance writing quality and provides concrete strategies for developing more effective teaching approaches for integrated writing tasks in EFL contexts. • The study examined 64 Chinese EFL students using mixed-methods experimental design. • Cohesive Harmony theory served as the framework for assessing writing coherence. • Explicit instruction significantly improved coherence in integrated writing tasks. • High-performing students demonstrated superior identity chain development.
-
Is it beneficial to strive for perfection in writing?: Exploring the relationship between perfectionism, motivational regulation, and second language (L2) writing performance ↗
Abstract
Perfectionism, a personality trait characterized by the pursuit of flawlessness and high personal standards, and motivational regulation, the strategies through which individuals manage their motivational states, have received limited attention in second language (L2) writing. Framed within social cognitive theory, this study examines how two dimensions of perfectionism—perfectionistic strivings and perfectionistic concerns—relate to writing performance (syntactic complexity, accuracy, lexical complexity, and fluency) and how motivational regulation sub-strategies (interest enhancement, self-talk, and emotional control) mediate these relationships. Data from 689 university students in China were analyzed using questionnaires and argumentative writing samples. Results indicated that perfectionistic strivings positively predicted syntactic complexity, accuracy, and lexical complexity, while perfectionistic concerns negatively predicted these dimensions; neither dimension significantly affected fluency. Crucially, motivational regulation sub-strategies partially mediated the relations between perfectionism and writing performance. These findings underscore the importance of distinguishing perfectionism dimensions and targeting motivational regulation strategies to improve L2 writing. Implications for instruction and directions for future longitudinal research are discussed. • Perfectionistic strivings and concerns affect writing via motivational regulation. • Strivings improve syntax, accuracy, and lexical complexity; concerns hinder them. • Most motivational regulation sub-strategies mediate perfectionism’s impact on CALF. • Perfectionism influences writing through motivational regulation.
October 2025
-
Abstract
Assessing the writing competence of pupils learning English as a foreign language (EFL) at primary school is challenging. This study aimed at examining a largely unexplored topic, namely the role of text characteristics in writing assessment, and analysed judgment accuracy differentiated by nine aspects of text quality (communicative effect, level of detail, coherence, cohesion, complexity of syntax and grammar, correctness of syntax and grammar, vocabulary, orthography and punctuation). Two hundred pre-service teachers assessed four randomly assigned texts from learners in grade six. Their assessment was compared to the existing ratings of two experts from a previous study. We found a relative judgment accuracy between r = .34 and .60 for the nine assessment criteria, with vocabulary being assessed significantly more accurately than almost all other criteria. Orthography, complexity and correctness of syntax and grammar and punctuation were rated with significantly more accuracy than cohesion, level of detail, communicative effect and coherence. The pre-service teachers assessed most criteria more strictly and with higher variability than the experts. The results suggest that teacher education should offer pre-service teachers concrete opportunities to practise writing assessment, implement activities to strengthen the assessment of content- and structure-related criteria, and help them adjust their assessment rigour. • Judgment accuracy in the assessment of primary school EFL learners’ texts. • Relative judgment accuracy between r = .34 and .60 for the different criteria. • Significant differences in relative judgment accuracy between assessment criteria. • Linguistic text qualities are assessed with more accuracy than content- and structure-related aspects. • Pre-service teachers are more rigorous and heterogeneous in rating than experts.
-
Exploring the scoring validity of holistic and dimension-based Comparative Judgements of young learners’ EFL writing ↗
Abstract
Comparative Judgement (CJ) is a pairwise comparison evaluation method, typically conducted online. Multiple judges each compare the quality of a series of paired performances and, from their decisions, a rank order is constructed and scores calculated. Research across different educational contexts supports CJ’s reliability for evaluating written performances, permitting more precise scoring of scripts and for dimension-focused evaluation. However, scant insights are available about the basis of judges’ evaluations. This issue is important because argument-based approaches to validation (common in the field of language testing and adopted in this study) require evidence to support claims about how scores are appropriate for test purpose. Therefore, we investigate the scoring validity of CJ, both when used holistically (the standard application of CJ) and when evaluating scripts by individual criteria (termed dimensions in the research context). Twenty-seven judges evaluated 300 scripts addressing two writing task types in a national English as a Foreign Language examination for young learners in Austria. Judges reported via questionnaires what they had focused on while judging. Subsequently, eight judges provided think-aloud data while evaluating 157 scripts, offering further insight into the writing features they considered and their decision-making during CJ. Findings showed that while most judges adapted a decision-making process similar to traditional rating methods, some adapted their method to accommodate the nature of CJ evaluation. Furthermore, results indicated that the judges considered construct-relevant criteria when using CJ, both holistically and by dimension, thus offering support to an argument for the appropriateness of using CJ in this context. • Comparative Judgement can offer an alternative to analytic rating of EFL writing. • Judges with teaching or rating experience largely focus on relevant text features. • Some judges adopt a decision-making process that appears well suited to CJ. • Dimension-based CJ has the potential to provide richer feedback than holistic CJ.
July 2025
-
Abstract
The notion of grammatical metaphor (GM) (Halliday, 1985) is essentially where a writer can shift an action or quality into being a ‘thing’. As in most senses of metaphor, the goal is to “represent something as something else” (McGrath & Liardét, 2023, p.33). This study investigated the use of grammatical metaphor (GM) in Linguaskill writing exam responses across CEFR proficiency levels (below-B1 to C1 or above). It analysed the presence of a pre-existing GM list (see McGrath & Liardét, 2023) to explore GM frequency in L2 responses, the correlative relationship with proficiency scores and qualitatively explored candidate responses in terms of how GMs were used. Results show a moderate positive correlation between proficiency and GM use, with a dominance of process-to-thing shifts (e.g., transform→transformation) and emergence of GM use from lower to higher proficiency levels. This underscores GM's significance in crafting academically valued meanings in L2 contexts, suggesting its potential for informing instructional and assessment practices. • Metaphorisation in Writing is a useful metric for L2 writing assessment. • Evidence suggests GM frequency correlates with increased performance. • Learners progress from emergent arguments to presenting ideas more concisely. • The majority of GM shifts were to ‘things’. • The study provides further weight to arguments for meaning-based complexity.
April 2025
January 2025
-
Examining the use of academic vocabulary in first-year ESL undergraduates’ writing: A corpus-driven study in Hong Kong ↗
Abstract
A good command of academic vocabulary is important for academic success in higher education. However, research has primarily focused on the receptive academic vocabulary knowledge of L2 learners while devoting relatively limited attention to their productive use of such vocabulary and its impact on writing quality. To address this gap, we analysed the problem-solution essays written by 168 first-year undergraduates in Hong Kong, focusing on the relationship between their use of academic words in the Academic Vocabulary List (AVL) and the overall quality of their writing. We also explored the relationship between the size of students’ receptive academic vocabulary and the frequency of its use in writing. Findings revealed that essays with high scores contained a greater density and diversity of academic vocabulary than low-scored essays, with greater frequency of words in the 1–500 and 501–1000 tiers of the AVL significantly predicting better writing quality. The essays also showed a significant relationship between the participants’ receptive academic vocabulary size and the diversity of academic words used in writing. However, no significant relationship was observed between receptive academic vocabulary size and the density of academic words used. We highlight the implications of these findings for EAP teaching and research. • Problem-solution essays written by undergraduates in Hong Kong were analysed. • Density and diversity of academic vocabulary (AV) predict L2 writing quality. • Learners’ receptive AV size significantly relates to AV diversity in their writing. • Only words from two tiers of the AVL significantly predicted writing scores. • A holistic and tiered approach to assessing AV use is important.
-
Investigating the effectiveness of scaffolded feedback on EFL Saudi students' writing accuracy: A longitudinal classroom-based study ↗
Abstract
Despite the growing body of research on feedback provided to L2 learners on their writing, few studies have investigated the use of a scaffolded approach to feedback. Sociocultural scholars argue that for feedback to be effective it needs to be scaffolded – dynamic and aligned to the learner’s ability to correct their errors (Aljaafreh & Lantolf, 1994). Although research on scaffolded feedback have found it to improve L2 writing accuracy, most of this research has been small-scale, using one-on-one conferences. This larger classroom-based study aimed to examine the effectiveness of scaffolded written feedback and students’ perceptions of this feedback approach. The study was quasi-experimental and implemented over one academic semester. The participants were 71 male students of intermediate English proficiency, majoring in English at a large Saudi university. They were divided into two groups: one group received scaffolded feedback; the other group received unscaffolded (indirect) feedback. The feedback targeted eight grammatical structures. Findings from the immediate and delayed post-tests showed that both groups improved in their overall writing accuracy over time, with no difference evident between the two groups. Moreover, both groups showed similar improvements in six of the eight targeted grammatical structures. The scaffolded feedback group showed greater improvement than their counterparts only on two structures: subject-verb agreement and singular-plural agreement. Interview findings showed that the scaffolded feedback group liked this approach mainly because of its novelty but preferred scaffolding only when it increased in explicitness. We conclude by considering whether and how scaffolded feedback can be provided in classroom settings. • Scaffolded and unscaffolded written corrective feedback (WCF) both enhance EFL writing accuracy. • Scaffolded WCF shows limited superiority in improving writing accuracy compared to unscaffolded WCF. • Saudi EFL students preferred scaffolded WCF, with explicit feedback being more appreciated over time. • Implicit WCF posed challenges for Saudi EFL students, leading to reduced response rates as feedback became more implicit.
October 2024
July 2024
-
Abstract
Directed self-placement (DSP) allows for student agency in writing placement. DSP has been implemented in many composition programs, although it has not been used as widely for L2 writers in higher education. This study investigates the relationship between student placement decisions and students’ prior educational backgrounds, particularly in relationship to whether they had attended an English-medium high school or an intensive English program (IEP). Actual placement results via an exam were compared to 804 students’ self-placement decisions and correlated with their prior educational backgrounds. Findings indicated that most students’ DSP decisions matched actual exam placement results. However, there was a large number of DSP decisions that were higher or lower than exam placement results. Additionally, the longer students studied at an English-medium instruction high school, the more likely they were to place themselves higher than their exam placement. We conclude that DSP can be used in L2 writing programs, but with careful attention to learners’ educational backgrounds, proficiency, and sense of identity.
-
Beyond accuracy gains: Investigating the impact of individual and collaborative feedback processing on L2 writing development ↗
Abstract
Despite the burgeoning research on exploring learner engagement with feedback, how second language (L2) learners’ engagement with feedback in different processing conditions influences their subsequent writing development is under-explored. This study examines the effects of individual and collaborative processing (languaging) of teacher feedback on Chinese lower-secondary school EFL learners’ writing development. Eighty-one students aged 13–14 with A1-A2 levels of English proficiency (according to the Common European Framework of Reference) from two classes and two experienced English teachers participated in the study. Students were provided with comprehensive teacher feedback and were asked to process feedback provided on three writing tasks through either individual written or collaborative oral languaging over six weeks. Pre-, post-, and delayed post-tests were administered. Students’ writing development was analysed using complexity, accuracy, and fluency measures, as well as content and organisation writing scores. Findings showed that the two conditions did not influence students’ writing complexity and fluency differently, while only the collaborative oral languaging condition contributed to students’ sustainable accuracy gains. Results based on the analytic writing scores suggested that students in the two conditions significantly improved content and organisation scores over time. Pedagogical and research implications regarding implementing the two feedback processing conditions are discussed.
-
EFL students' syntactic complexity development in argumentative writing:A latent class growth analysis (LCGA) approach ↗
Abstract
The study explored EFL students' development of syntactic complexity by employing the Latent Class Growth Analysis (LCGA) approach. A total of 214 tertiary EFL students from Southwest China were invited to write four argumentative essays over an academic semester. The unconditional models of LCGA were utilized to explore the optimal latent classes of students' development trajectories of syntactic complexity. The conditional models of LCGA were employed to investigate the predictive effect of English proficiency on the optimal latent classes. Results of the unconditional models revealed different latent classes of development trajectories for six indices of syntactic complexity rather than the remaining ones, which offers tentative evidence for the heterogeneity of L2 development trajectories. Results of the conditional models showed that English proficiency did not predict the membership in these latent classes. These results are discussed and implications for L2 instruction are attempted.
April 2024
-
Is the variation in syntactic complexity features observed in argumentative essays produced by B1 level EFL learners in Finland and Pakistan attributable exclusively to their L1? ↗
Abstract
This study has explored the syntactic complexity features of English learners at the B1 Common European Framework of Reference (CEFR) (CoE, 2001) level from both Pakistan and Finland. The learners in question were taught English as a Foreign Language (EFL) using different pedagogical methods. This study took into account various factors including the learners' proficiency level, age, and grade, as well as variations in their native language. To assess the impact of the learners' native language and pedagogical methods on syntactic complexity features, twelfth grade EFL students from Upper-Secondary schools in both nations were given identical instructions and time limits to complete an English academic essay on the same topic. The study utilized L2 syntactic complexity analyzer (L2SCA) to extract fourteen syntactic complexity features, and Mann-Whitney U Tests were used to analyze the differences in the syntactic complexity features between the two groups. The study has revealed significant differences between Finnish and Pakistani EFL learners due to variations in their native language and the effects of pedagogical methods on syntactic complexity features. The implications of this study extend to language testing and assessment, the CEFR framework, and pedagogy in both Finland and Pakistan.
-
Abstract
Research into the contribution of multimodality to language learning is gaining momentum. While most studies pave the way for new understandings of language teaching and learning, there is an increasing demand for comprehensive assessment practices, particularly within higher education contexts. A few studies have emphasized the importance of reflecting on and establishing criteria for the assessment of multimodal literacy. This is necessary to understand students’ contributions in detail and to provide them with effective support in developing their multimodal skills. This study discusses the assessment of multimodal writing in English for Specific Purposes (ESP) contexts. It presents the design of an analytical tool for assessing multimodal texts and provided an example of its application. This tool covers assessment categories such as language use, content expression, interpersonal meaning, multimodality, and creativity and originality. As an example, we focus on the multimodal writing of a video game narrative, a genre that requires the integration of multiple modes of communication to convey meaning more effectively. Finally, this study offers pedagogical insights into the assessment of multimodal literacy in ESP.