Assessing Writing
30 articlesApril 2026
-
Pursuing fair writing assessment: Halo effects in primary school foreign language writing in grade six ↗
Abstract
Assessing the writing competence of pupils learning English as a foreign language (EFL) at primary school is associated with specific challenges because of learners’ limited language resources. This study investigates the extent to which characteristics of their texts trigger so-called halo effects. Halo effects are an assessment bias where the quality of one feature unintentionally influences the evaluation of other aspects. The study examines halo effects across nine aspects of text quality (communicative effect, level of detail, coherence, cohesion, complexity of syntax and grammar, correctness of syntax and grammar, vocabulary, orthography and punctuation), based on a random sample of narrative texts from a sixth-grade corpus. 200 pre-service teachers assessed four randomly assigned texts. Halo effects were calculated by comparison to expert ratings using multi-level regression analyses. Results show that orthography and vocabulary were the two main triggers of halo effects. Punctuation also triggered some halo effects, but to a smaller extent. The assessment of communicative effect, complexity and correctness of syntax and grammar was not determined by the corresponding text quality but dominated by other criteria. Results highlight the importance of being aware of halo effects when assessing young EFL learners’ texts and emphasise the need for suitable training measures. • Analysis of halo effects across nine aspects of text quality. • Random sample of narrative texts from a sixth-grade EFL corpus. • Orthography and vocabulary are the two main triggers of halo effects. • Punctuation also triggers halo effects but to a smaller extent. • Halo effects call for awareness and targeted training.
-
How do L2 writing subskills interact hierarchically? Insights from diagnostic classification models ↗
Abstract
This study examined the hierarchical structure among second/foreign language (L2) writing subskills using a Hierarchical Diagnostic Classification Model (HDCM). A pool of 500 essays composed by English as a Foreign Language (EFL) students was assessed by four experienced EFL teachers using the Empirically-derived Descriptor-based Diagnostic (EDD) checklist. Based on a literature review and the expertise of three content experts, several models were developed to reflect various hierarchical interactions among L2 writing subskills, including linear, divergent, convergent, independent, unstructured, mixed, and higher-order. The comparison of the models showed the presence of an unstructured interaction among L2 writing subskills, indicating that content is the foundational subskill for the mastery of vocabulary, grammar, organization, and mechanics. Higher mastery classes were also associated with higher educational levels, greater frequency of English use, and longer exposure to L2. Understanding the hierarchical relationships among L2 writing subskills can improve targeted instructional strategies and assessment practices. • A constrained version of existing DCMs is represented by hierarchical DCMs. • Models were developed to show hierarchical interactions among L2 writing subskills. • An unstructured interaction among L2 writing subskills was identified. • Higher mastery classes were associated with higher educational levels. • The classes were associated with greater English use and longer L2 exposure.
January 2026
-
The effects of online resource use on L2 learners’ computer-mediated writing processes and written products ↗
Abstract
While previous studies on online resource use in L2 writing have focused on the overall writing quality, limited attention has been paid to its effects on linguistic complexity and real-time writing processes. Addressing this gap, the present study explored how online resource use influences both the processes and products of L2 writing. Forty-nine intermediate L2 learners completed two computer-mediated argumentative writing tasks, either with or without the use of online resources. Writing behaviors were captured via keystroke logging and screen recording, and analyzed for search activity, fluency, pausing, and revision quantity. Cognitive processes were examined through stimulated recall interviews, and written products were evaluated for both quality and linguistic complexity. The results showed that participants spent an average of 14 % of task time using online resources, with considerable individual variation. Mixed-effects modeling revealed that resource use facilitated the production of more sophisticated words, with marginal influence on writing quality or syntactic complexity. Resource use was also associated with longer between-word pauses, fewer within-word pauses, and reduced revisions. These findings highlight the potential of online resource use to enhance the authenticity of L2 writing assessment tasks without compromising test validity, while encouraging the use of more advanced vocabulary in writing. • Learners spent 14 % of the total writing task time using online resources. • Online resource use had no significant impact on L2 writing quality. • Online resource use improved lexical sophistication, not syntactic complexity. • Online resource use reduced within-word pauses and aided spelling retrieval. • Online resource use led to fewer revisions but did not affect fluency.
-
Abstract
Peer evaluation is widely recognized for its educational benefits; however, its reliability and validity, particularly among adolescent second-language (L2) writers at the early stages of English language and literacy development, remain insufficiently explored. This explanatory sequential mixed-methods study investigated the reliability and validity of peer evaluation in English argumentative writing among 35 Grade 10 and 37 Grade 12 students from a public high school in Beijing, China. Twelve of the participating students (six at each grade) were interviewed about the validity, reliability, and value of peer evaluation. The findings indicated that peer evaluations demonstrated high levels of reliability and validity, with peer-assessed writing scores closely aligning with inter-teacher assessments. Notably, variations were observed among Grade 10 students, particularly in the evaluation of lower-order writing skills, such as grammar and vocabulary, which exhibited reduced validity. These results underscore the potential of peer evaluation in assessing higher-order content-level writing across varying levels of L2 English writing proficiency. The study also highlights areas where adolescent L2 writers may require additional support to enhance the effectiveness of peer evaluation practices in English argumentative writing. Implications for improving English argumentative writing instruction and refining peer evaluation strategies in high school L2 English classrooms are discussed. • Peer evaluation shows high reliability, similar to inter-teacher rating. • Peer evaluation works well for higher-order skills in L2 argumentative writing. • 10th graders struggled with evaluating lower-order skills like grammar. • 12th graders evaluate lower- and higher-order skills with greater validity than 10th graders.
-
Is it beneficial to strive for perfection in writing?: Exploring the relationship between perfectionism, motivational regulation, and second language (L2) writing performance ↗
Abstract
Perfectionism, a personality trait characterized by the pursuit of flawlessness and high personal standards, and motivational regulation, the strategies through which individuals manage their motivational states, have received limited attention in second language (L2) writing. Framed within social cognitive theory, this study examines how two dimensions of perfectionism—perfectionistic strivings and perfectionistic concerns—relate to writing performance (syntactic complexity, accuracy, lexical complexity, and fluency) and how motivational regulation sub-strategies (interest enhancement, self-talk, and emotional control) mediate these relationships. Data from 689 university students in China were analyzed using questionnaires and argumentative writing samples. Results indicated that perfectionistic strivings positively predicted syntactic complexity, accuracy, and lexical complexity, while perfectionistic concerns negatively predicted these dimensions; neither dimension significantly affected fluency. Crucially, motivational regulation sub-strategies partially mediated the relations between perfectionism and writing performance. These findings underscore the importance of distinguishing perfectionism dimensions and targeting motivational regulation strategies to improve L2 writing. Implications for instruction and directions for future longitudinal research are discussed. • Perfectionistic strivings and concerns affect writing via motivational regulation. • Strivings improve syntax, accuracy, and lexical complexity; concerns hinder them. • Most motivational regulation sub-strategies mediate perfectionism’s impact on CALF. • Perfectionism influences writing through motivational regulation.
October 2025
-
Abstract
Assessing the writing competence of pupils learning English as a foreign language (EFL) at primary school is challenging. This study aimed at examining a largely unexplored topic, namely the role of text characteristics in writing assessment, and analysed judgment accuracy differentiated by nine aspects of text quality (communicative effect, level of detail, coherence, cohesion, complexity of syntax and grammar, correctness of syntax and grammar, vocabulary, orthography and punctuation). Two hundred pre-service teachers assessed four randomly assigned texts from learners in grade six. Their assessment was compared to the existing ratings of two experts from a previous study. We found a relative judgment accuracy between r = .34 and .60 for the nine assessment criteria, with vocabulary being assessed significantly more accurately than almost all other criteria. Orthography, complexity and correctness of syntax and grammar and punctuation were rated with significantly more accuracy than cohesion, level of detail, communicative effect and coherence. The pre-service teachers assessed most criteria more strictly and with higher variability than the experts. The results suggest that teacher education should offer pre-service teachers concrete opportunities to practise writing assessment, implement activities to strengthen the assessment of content- and structure-related criteria, and help them adjust their assessment rigour. • Judgment accuracy in the assessment of primary school EFL learners’ texts. • Relative judgment accuracy between r = .34 and .60 for the different criteria. • Significant differences in relative judgment accuracy between assessment criteria. • Linguistic text qualities are assessed with more accuracy than content- and structure-related aspects. • Pre-service teachers are more rigorous and heterogeneous in rating than experts.
July 2025
January 2025
-
A meta-analysis of relationships between syntactic features and writing performance and how the relationships vary by student characteristics and measurement features ↗
Abstract
Students’ proficiency in constructing sentences impacts the writing process and writing products. Linguistic demands in writing differ in terms of both student characteristics and measurement features. To identify various syntactic demands considering these features, we conducted a meta-analysis examining the relationships between syntactic features (complexity and accuracy) and writing performance (quality, productivity, and fluency) and moderating effects of both student characteristics and measurement features. A total of 109 studies (effect sizes: 871; the total number of participants: 24,628) met the inclusion criteria. Results showed that there was a weak relationship for syntactic accuracy (r = .25) and complexity (r = .16). Writers' characteristics, including grade level and language proficiency, and measurement features, writing genres, writing outcomes, whether the writing task is text-based or not, and type of syntactic complexity measures, were significant moderators for certain syntactic features. The findings highlighted the importance of writer and measurement factors when considering the relationships between linguistic features in writing and writing performance. Implications were discussed regarding the selection of syntactic features in assessing language use in writing, gaps in the literature, and significance for writing instruction and assessment. • Aimed to depict the relationships between syntactic features and writing performance. • Found weak relationships between syntactic features and writing outcomes. • Relationships vary as a function of student characteristics and measurement features. • Noun phrase complexity might be more valid than some traditional syntactic complexity measures. • Findings have important implications for writing assessments.
July 2024
-
EFL students' syntactic complexity development in argumentative writing:A latent class growth analysis (LCGA) approach ↗
Abstract
The study explored EFL students' development of syntactic complexity by employing the Latent Class Growth Analysis (LCGA) approach. A total of 214 tertiary EFL students from Southwest China were invited to write four argumentative essays over an academic semester. The unconditional models of LCGA were utilized to explore the optimal latent classes of students' development trajectories of syntactic complexity. The conditional models of LCGA were employed to investigate the predictive effect of English proficiency on the optimal latent classes. Results of the unconditional models revealed different latent classes of development trajectories for six indices of syntactic complexity rather than the remaining ones, which offers tentative evidence for the heterogeneity of L2 development trajectories. Results of the conditional models showed that English proficiency did not predict the membership in these latent classes. These results are discussed and implications for L2 instruction are attempted.
April 2024
-
Is the variation in syntactic complexity features observed in argumentative essays produced by B1 level EFL learners in Finland and Pakistan attributable exclusively to their L1? ↗
Abstract
This study has explored the syntactic complexity features of English learners at the B1 Common European Framework of Reference (CEFR) (CoE, 2001) level from both Pakistan and Finland. The learners in question were taught English as a Foreign Language (EFL) using different pedagogical methods. This study took into account various factors including the learners' proficiency level, age, and grade, as well as variations in their native language. To assess the impact of the learners' native language and pedagogical methods on syntactic complexity features, twelfth grade EFL students from Upper-Secondary schools in both nations were given identical instructions and time limits to complete an English academic essay on the same topic. The study utilized L2 syntactic complexity analyzer (L2SCA) to extract fourteen syntactic complexity features, and Mann-Whitney U Tests were used to analyze the differences in the syntactic complexity features between the two groups. The study has revealed significant differences between Finnish and Pakistani EFL learners due to variations in their native language and the effects of pedagogical methods on syntactic complexity features. The implications of this study extend to language testing and assessment, the CEFR framework, and pedagogy in both Finland and Pakistan.
-
Assessing writing and spelling interest and self-beliefs: Does the type of pictorial support affect first and third graders’ responses? ↗
Abstract
An array of pictorial supports (e.g., emojis, geometrical figures, animals) is often used in studies assessing young students’ writing motivation with Likert scales. However, although these images may influence the students’ responses, sufficient rationales for these choices are often absent from the studies. To the best of our knowledge, the present study is the first to investigate two different types of pictorial support (circles vs. faces) in Likert scales assessing first and third graders’ writing interest, self-concept, and spelling interest and self-efficacy. The samples consist of 2197 first graders (mean age 6.8 years) and 1740 third graders (mean age 8.4 years). Results show statistically significant differences among the scales indicating that when face-scales are used, first-graders skip motivation items more often, and students in both grades avoid the minimum values of the scale more often. Gender differences are also found indicating that when face-scales are used, boys in third grade avoid maximum values more often, and girls in both grades avoid the minimum values more often. These findings suggest that the use of circle-scales compared to face-scales seem more appropriate in scales measuring young students’ writing and spelling interest and self-beliefs.
January 2024
-
Abstract
The present study used the Mixed Rasch Model (MRM) to identify multiple profiles in L2 students’ writing with regard to several linguistic features, including content, organization, grammar, vocabulary, and mechanics. To this end, a pool of 500 essays written by English as a foreign language (EFL) students were rated by four experienced EFL teachers using the Empirically-derived Descriptor-based Diagnostic (EDD) checklist. The ratings were subjected to MRM analysis. Two distinct profiles of L2 writers emerged from the sample analyzed including: (a) Sentence-Oriented and (b) Paragraph-Oriented L2 Writers. Sentence-Oriented L2 Writers tend to focus more on linguistic features, such as grammar, vocabulary, and mechanics, at the sentence level and try to utilize these subskills to generate a written text. However, Paragraph-Oriented Writers are inclined to move beyond the boundaries of a sentence and attend to the structure of a whole paragraph using higher-order features such as content and organization subskills. The two profiles were further examined to capture their unique features. Finally, the theoretical and pedagogical implications of the identification of L2 writing profiles and suggestions for further research are discussed.
April 2023
January 2023
October 2022
-
Abstract
Integrated tasks are increasing in popularity, either replacing or complementing writing-only independent tasks in writing assessments. This shift has generated many research interests to investigate the underlying construct and features of integrated writing (IW) performances. However, due to the complexity of the IW construct, there are conflicting findings about whether and the extent to which various language skills and IW text features correlate to IW scores. To understand the construct of IW, we conducted a meta-analysis to synthesize correlation coefficients between scores of IW performances and (1) other language skills and (2) text quality features of IW. We also examined factors that may moderate the correlation of IW scores with these two groups of correlates. Consequently, (1) reading and writing skills showed stronger correlations than listening to IW scores; and (2) text length had a strongest correlation, followed by source integration, organization and syntactic complexity, with a smallest correlation of lexical complexity. Several IW task features affected the magnitude of correlations. The results supported the view that IW is an independent construct, albeit related, from other language skills and IW task features may affect the construct of IW.
July 2022
April 2022
January 2022
January 2021
October 2020
July 2019
January 2019
-
Abstract
Numerous studies have examined the relationship between lexical features of students’ compositions and judgements of text quality. However, the degree to which teachers’ judgements are influenced by the quality of vocabulary in students’ essays with regard to their assessment of other textual characteristics is relatively unexplored. This experimental study investigates the influence of lexical features on teachers’ judgements of English as a second language (ESL) argumentative essays. Using analytic and holistic rating scales, English pre-service teachers (N = 37) in Switzerland assessed four essays of different proficiency levels in which the levels of lexical diversity and sophistication had been experimentally varied. Coh-Metrix software was used to manipulate the level of lexical diversity, as measured by MTLD and D, and the Tool for the Automatic Analysis of Lexical Sophistication (TAALES) software was used to obtain differing levels of lexical sophistication, as measured by word range. The results suggested that texts with greater lexical diversity and sophistication were assessed more positively concerning their overall quality as well as the analytic criteria ‘grammar’ and ‘frame of essay’. The implications of this study for classroom practice and teacher education are discussed.