Assessing Writing

1016 articles
Year: Topic:
Export:

January 2025

  1. Examining EFL learners’ quantity and quality of uptake of teacher corrective feedback on writing across three different editing settings
    doi:10.1016/j.asw.2024.100911
  2. Examining the use of academic vocabulary in first-year ESL undergraduates’ writing: A corpus-driven study in Hong Kong
    Abstract

    A good command of academic vocabulary is important for academic success in higher education. However, research has primarily focused on the receptive academic vocabulary knowledge of L2 learners while devoting relatively limited attention to their productive use of such vocabulary and its impact on writing quality. To address this gap, we analysed the problem-solution essays written by 168 first-year undergraduates in Hong Kong, focusing on the relationship between their use of academic words in the Academic Vocabulary List (AVL) and the overall quality of their writing. We also explored the relationship between the size of students’ receptive academic vocabulary and the frequency of its use in writing. Findings revealed that essays with high scores contained a greater density and diversity of academic vocabulary than low-scored essays, with greater frequency of words in the 1–500 and 501–1000 tiers of the AVL significantly predicting better writing quality. The essays also showed a significant relationship between the participants’ receptive academic vocabulary size and the diversity of academic words used in writing. However, no significant relationship was observed between receptive academic vocabulary size and the density of academic words used. We highlight the implications of these findings for EAP teaching and research. • Problem-solution essays written by undergraduates in Hong Kong were analysed. • Density and diversity of academic vocabulary (AV) predict L2 writing quality. • Learners’ receptive AV size significantly relates to AV diversity in their writing. • Only words from two tiers of the AVL significantly predicted writing scores. • A holistic and tiered approach to assessing AV use is important.

    doi:10.1016/j.asw.2024.100913
  3. Connecting L2 reading emotions and writing performance through imaginative capacity in the story continuation writing task: A gender difference perspective
    doi:10.1016/j.asw.2025.100914
  4. A meta-analysis of relationships between syntactic features and writing performance and how the relationships vary by student characteristics and measurement features
    Abstract

    Students’ proficiency in constructing sentences impacts the writing process and writing products. Linguistic demands in writing differ in terms of both student characteristics and measurement features. To identify various syntactic demands considering these features, we conducted a meta-analysis examining the relationships between syntactic features (complexity and accuracy) and writing performance (quality, productivity, and fluency) and moderating effects of both student characteristics and measurement features. A total of 109 studies (effect sizes: 871; the total number of participants: 24,628) met the inclusion criteria. Results showed that there was a weak relationship for syntactic accuracy (r = .25) and complexity (r = .16). Writers' characteristics, including grade level and language proficiency, and measurement features, writing genres, writing outcomes, whether the writing task is text-based or not, and type of syntactic complexity measures, were significant moderators for certain syntactic features. The findings highlighted the importance of writer and measurement factors when considering the relationships between linguistic features in writing and writing performance. Implications were discussed regarding the selection of syntactic features in assessing language use in writing, gaps in the literature, and significance for writing instruction and assessment. • Aimed to depict the relationships between syntactic features and writing performance. • Found weak relationships between syntactic features and writing outcomes. • Relationships vary as a function of student characteristics and measurement features. • Noun phrase complexity might be more valid than some traditional syntactic complexity measures. • Findings have important implications for writing assessments.

    doi:10.1016/j.asw.2024.100909
  5. Editorial Volume 63
    doi:10.1016/j.asw.2025.100917
  6. Examining the predictive power of L2 writing anxiety on L2 writing performance in simple and complex tasks under task-readiness conditions
    doi:10.1016/j.asw.2024.100912
  7. Investigating the effectiveness of scaffolded feedback on EFL Saudi students' writing accuracy: A longitudinal classroom-based study
    Abstract

    Despite the growing body of research on feedback provided to L2 learners on their writing, few studies have investigated the use of a scaffolded approach to feedback. Sociocultural scholars argue that for feedback to be effective it needs to be scaffolded – dynamic and aligned to the learner’s ability to correct their errors (Aljaafreh & Lantolf, 1994). Although research on scaffolded feedback have found it to improve L2 writing accuracy, most of this research has been small-scale, using one-on-one conferences. This larger classroom-based study aimed to examine the effectiveness of scaffolded written feedback and students’ perceptions of this feedback approach. The study was quasi-experimental and implemented over one academic semester. The participants were 71 male students of intermediate English proficiency, majoring in English at a large Saudi university. They were divided into two groups: one group received scaffolded feedback; the other group received unscaffolded (indirect) feedback. The feedback targeted eight grammatical structures. Findings from the immediate and delayed post-tests showed that both groups improved in their overall writing accuracy over time, with no difference evident between the two groups. Moreover, both groups showed similar improvements in six of the eight targeted grammatical structures. The scaffolded feedback group showed greater improvement than their counterparts only on two structures: subject-verb agreement and singular-plural agreement. Interview findings showed that the scaffolded feedback group liked this approach mainly because of its novelty but preferred scaffolding only when it increased in explicitness. We conclude by considering whether and how scaffolded feedback can be provided in classroom settings. • Scaffolded and unscaffolded written corrective feedback (WCF) both enhance EFL writing accuracy. • Scaffolded WCF shows limited superiority in improving writing accuracy compared to unscaffolded WCF. • Saudi EFL students preferred scaffolded WCF, with explicit feedback being more appreciated over time. • Implicit WCF posed challenges for Saudi EFL students, leading to reduced response rates as feedback became more implicit.

    doi:10.1016/j.asw.2024.100910

October 2024

  1. Effects of a genre and topic knowledge activation device on a standardized writing test performance
    Abstract

    The aim of this article was twofold: first, to introduce a design for a writing test intended for application in large-scale assessments of writing, and second, to experimentally examine the effects of employing a device for activating prior knowledge of topic and genre as a means of controlling construct-irrelevant variance and enhancing validity. An authentic, situated writing task was devised, offering students a communicative purpose and a defined audience. Two devices were utilized for the cognitive activation of topic and genre knowledge: an infographic and a genre model. The participants in this study were 162 fifth-grade students from Santiago de Chile, with 78 students assigned to the experimental condition (with activation device) and 84 students assigned to the control condition (without activation device). The results demonstrate that the odds of presenting good writing ability are higher for students who were part of the experimental group, even when controlling for text transcription ability, considered a predictor of writing. These findings hold implications for the development of large-scale tests of writing guided by principles of educational and social justice. • Genre and topic knowledge are forms of prior knowledge relevant to writing. • Higher odds for better writing in students exposed to prior knowledge activation. • Results support use of prior knowledge activation in standardized assessment.

    doi:10.1016/j.asw.2024.100898
  2. A comparative study of voice in Chinese English-major undergraduates’ timed and untimed argument writing
    doi:10.1016/j.asw.2024.100896
  3. Effects of writing feedback literacies on feedback engagement and writing performance: A cross-linguistic perspective
    doi:10.1016/j.asw.2024.100889
  4. A structural equation investigation of linguistic features as indices of writing quality in assessed secondary-level EMI learners’ scientific reports
    doi:10.1016/j.asw.2024.100897
  5. Exploring the use of model texts as a feedback instrument in expository writing: EFL learners’ noticing, incorporations, and text quality
    doi:10.1016/j.asw.2024.100890
  6. Exploring the development of noun phrase complexity in L2 English writings across two genres
    doi:10.1016/j.asw.2024.100892
  7. Detecting and assessing AI-generated and human-produced texts: The case of second language writing teachers
    doi:10.1016/j.asw.2024.100899
  8. Editorial
    doi:10.1016/j.asw.2024.100900
  9. Editorial Board
    doi:10.1016/s1075-2935(24)00097-7
  10. Validating an integrated reading-into-writing scale with trained university students
    Abstract

    Integrated tasks are often used in higher education (HE) for diagnostic purposes, with increasing popularity in lingua franca contexts, such as German HE, where English-medium courses are gaining ground. In this context, we report the validation of a new rating scale for assessing reading-into-writing tasks. To examine scoring validity, we employed Weir’s (2005) socio-cognitive framework in an explanatory mixed-methods design. We collected 679 integrated performances in four summary and opinion tasks, which were rated by six trained student raters. They are to become writing tutors for first-year students. We utilized a many-facet Rasch model to investigate rater severity, reliability, consistency, and scale functioning. Using thematic analysis, we analyzed think-aloud protocols, retrospective and focus group interviews with the raters. Findings showed that the rating scale overall functions as intended and is perceived by the raters as valid operationalization of the integrated construct. FACETS analyses revealed reasonable reliabilities, yet exposed local issues with certain criteria and band levels. This is corroborated by the challenges reported by the raters, which they mainly attributed to the complexities inherent in such an assessment. Applying Weir’s (2005) framework in a mixed-methods approach facilitated the interpretation of the quantitative findings and yielded insights into potential validity threads. • FACET analyses show reasonable reliabilities and scale functioning. • Mixed-methods approach facilitates interpreting the quantitative findings. • Raters perceive rating scale as valid operationalization of integrated construct. • Applying Weir’s socio-cognitive framework reveals potential validity threads. • Raters attribute challenges to the complexities inherent in integrated writing.

    doi:10.1016/j.asw.2024.100894
  11. L2 master’s and doctoral students’ preferences for supervisor written feedback on their theses/dissertations
    doi:10.1016/j.asw.2024.100891
  12. The impact of task duration on the scoring of independent writing responses of adult L2-English writers
    Abstract

    In writing assessment, there is inherently a tension between authenticity and practicality: tasks with longer durations may more closely reflect real-life writing processes but are less feasible to administer and score. What is more, given total testing time, there is necessarily a trade-off between task duration and number of tasks. Traditionally, high-stakes assessments have managed this trade-off by administering one or two writing tasks each test, allowing 20–40 minutes per task. However, research on second language (L2) English writing has not found longer task durations to significantly improve score validity or reliability. Importantly, very few studies have compared much shorter durations for writing tasks to more traditional allotments. To explore this issue, we asked adult L2-English test takers to respond to two writing prompts with either 5-minute or 20-minute time limits. Responses were then evaluated by expert human raters and an automated writing evaluation tool. Regardless of scoring method, short duration scores evidenced equally high test-retest reliability and criterion validity as long duration scores. As expected, longer task duration yielded higher scores, but regardless of duration, test takers demonstrated the entire spectrum of writing proficiency. Implications for writing assessment are discussed in relation to scoring practices and task design. • Longer writing tasks do not have higher test-retest reliability than shorter ones. • Longer writing tasks do not have higher criterion validity than shorter ones. • The impact of task duration is not mediated by scoring method (human or machine).

    doi:10.1016/j.asw.2024.100895
  13. Understanding the SSARC model of task sequencing: Assessing L2 writing development
    doi:10.1016/j.asw.2024.100893

July 2024

  1. Influence of prior educational contexts on directed self-placement of L2 writers
    Abstract

    Directed self-placement (DSP) allows for student agency in writing placement. DSP has been implemented in many composition programs, although it has not been used as widely for L2 writers in higher education. This study investigates the relationship between student placement decisions and students’ prior educational backgrounds, particularly in relationship to whether they had attended an English-medium high school or an intensive English program (IEP). Actual placement results via an exam were compared to 804 students’ self-placement decisions and correlated with their prior educational backgrounds. Findings indicated that most students’ DSP decisions matched actual exam placement results. However, there was a large number of DSP decisions that were higher or lower than exam placement results. Additionally, the longer students studied at an English-medium instruction high school, the more likely they were to place themselves higher than their exam placement. We conclude that DSP can be used in L2 writing programs, but with careful attention to learners’ educational backgrounds, proficiency, and sense of identity.

    doi:10.1016/j.asw.2024.100870
  2. Modeling relationships among large-grained, fine-grained absolute syntactic complexity and assessed L2 writing quality: An SEM approach
    doi:10.1016/j.asw.2024.100875
  3. Comparing Chinese L2 writing performance in paper-based and computer-based modes: Perspectives from the writing product and process
    doi:10.1016/j.asw.2024.100849
  4. Effects of peer feedback in English writing classes on EFL students’ writing feedback literacy
    doi:10.1016/j.asw.2024.100874
  5. How syntactic complexity indices predict Chinese L2 writing quality: An analysis of unified dependency syntactically-annotated corpus
    doi:10.1016/j.asw.2024.100847
  6. Construct representation and predictive validity of integrated writing tasks: A study on the writing component of the Duolingo English Test
    Abstract

    This study examined whether two integrated reading-to-write tasks could broaden the construct representation of the writing component of Duolingo English Test (DET). It also verified whether they could enhance DET’s predictive power of English academic writing in universities. The tasks were (1) writing a summary based on two source texts and (2) writing a reading-to-write essay based on five texts. Both were given to a sample (N = 204) of undergraduates from Hong Kong. Each participant also submitted an academic assignment written for the assessment of a disciplinary course. Three professional raters double-marked all writing samples against detailed analytical rubrics. Raw scores were first processed using Multi-Faceted Rasch Measurement to estimate inter- and intra-rater consistency and generate adjusted (fair) measures. Based on these measures, descriptive analyses, sequential multiple regression, and Structural Equation Modeling were conducted (in that order). The analyses verified the writing tasks’ underlying component constructs and assessed their relative contributions to the overall integrated writing scores. Both tasks were found to contribute to DET’s construct representation and add moderate predictive power to the domain performance. The findings, along with their practical implications, are discussed, especially regarding the complex relations between construct representation and predictive validity. • studied the concepts of construct representation (CR) and predictive validity (PV). • within the context of an AI-facilitated language test (Duolingo English Test). • Revealed the complex relations between CR and PV.

    doi:10.1016/j.asw.2024.100846
  7. Examining teacher’s evaluative language in written, audio and screencast feedback on EFL learners’ writing from the appraisal framework: A linguistic perspective
    doi:10.1016/j.asw.2024.100871
  8. Corrigendum to “Assessing metacognition-based student feedback literacy for academic writing” [Assessing Writing 59 (2024) 100811]
    doi:10.1016/j.asw.2024.100869
  9. Navigating innovation and equity in writing assessment
    doi:10.1016/j.asw.2024.100873
  10. Editorial
    doi:10.1016/j.asw.2024.100879
  11. A teacher’s inquiry into diagnostic assessment in an EAP writing course
    doi:10.1016/j.asw.2024.100848
  12. Examining the direct and indirect impacts of verbatim source use on linguistic complexity in integrated argumentative writing assessment
    Abstract

    Verbatim source use (VSU) in integrated argumentative writing tasks may enhance linguistic complexity of writing performance. This assistance might present an unequal advantage for test-takers across levels of writing proficiency, engendering validity and fairness concerns. While previous research has mostly examined the relationships between source use characteristics and proficiency levels, the relationship between VSU and linguistic complexity remains underexplored. To further unpack these relationships, this study examined both the direct impact of VSU on linguistic complexity of writing performances and its indirect impact through interaction with writing proficiency. Using natural language processing tools and techniques, we examined 34 linguistic complexity features and three VSU features of 3250 argumentative writing performances on a university-level English Placement Test (EPT). We performed exploratory factor analysis to identify linguistic complexity dimensions and applied mixed-effect models to examine how VSU features and proficiency level impacted these dimensions. Post-hoc analyses suggested weak direct impacts of different VSU features on linguistic complexity, which might reflect different essay writing strategies. However, no meaningful indirect impact was found. The findings help unravel the impact of VSU on argumentative writing and provide empirical evidence for validity arguments for integrated writing assessments.

    doi:10.1016/j.asw.2024.100868
  13. Beyond accuracy gains: Investigating the impact of individual and collaborative feedback processing on L2 writing development
    Abstract

    Despite the burgeoning research on exploring learner engagement with feedback, how second language (L2) learners’ engagement with feedback in different processing conditions influences their subsequent writing development is under-explored. This study examines the effects of individual and collaborative processing (languaging) of teacher feedback on Chinese lower-secondary school EFL learners’ writing development. Eighty-one students aged 13–14 with A1-A2 levels of English proficiency (according to the Common European Framework of Reference) from two classes and two experienced English teachers participated in the study. Students were provided with comprehensive teacher feedback and were asked to process feedback provided on three writing tasks through either individual written or collaborative oral languaging over six weeks. Pre-, post-, and delayed post-tests were administered. Students’ writing development was analysed using complexity, accuracy, and fluency measures, as well as content and organisation writing scores. Findings showed that the two conditions did not influence students’ writing complexity and fluency differently, while only the collaborative oral languaging condition contributed to students’ sustainable accuracy gains. Results based on the analytic writing scores suggested that students in the two conditions significantly improved content and organisation scores over time. Pedagogical and research implications regarding implementing the two feedback processing conditions are discussed.

    doi:10.1016/j.asw.2024.100876
  14. Matches and mismatches between Saudi university students' English writing feedback preferences and teachers' practices
    doi:10.1016/j.asw.2024.100863
  15. Does “more complexity” equal “better writing”? Investigating the relationship between form-based complexity and meaning-based complexity in high school EFL learners’ argumentative writing
    doi:10.1016/j.asw.2024.100867
  16. Thirty years of writing assessment: A bibliometric analysis of research trends and future directions
    doi:10.1016/j.asw.2024.100862
  17. Analysis and recommendation system-based on PRISMA checklist to write systematic review
    doi:10.1016/j.asw.2024.100866
  18. Exploring the multi-dimensional human mind: Model-based and text-based approaches
    doi:10.1016/j.asw.2024.100878
  19. Discourse competence in Hong Kong secondary students’ disciplinary research writing
    doi:10.1016/j.asw.2024.100872
  20. EvaluMate: Using AI to support students’ feedback provision in peer assessment for writing
    doi:10.1016/j.asw.2024.100864
  21. A large-scale corpus for assessing written argumentation: PERSUADE 2.0
    Abstract

    This research methods article introduces the open source PERSUADE 2.0 corpus. The PERSUADE 2.0 corpus comprises over 25,000 argumentative essays produced by 6th-12th grade students in the United States for 15 prompts on two writing tasks: independent and source-based writing. The PERSUADE 2.0 corpus also provides detailed individual and demographic information for each writer. The goal of the PERSUADE 2.0 corpus is to advance research into relationships between discourse elements, their effectiveness, writing quality, writing tasks and prompts, and demographic and individual differences.

    doi:10.1016/j.asw.2024.100865
  22. Editorial Board
    doi:10.1016/s1075-2935(24)00076-x
  23. EFL students' syntactic complexity development in argumentative writing:A latent class growth analysis (LCGA) approach
    Abstract

    The study explored EFL students' development of syntactic complexity by employing the Latent Class Growth Analysis (LCGA) approach. A total of 214 tertiary EFL students from Southwest China were invited to write four argumentative essays over an academic semester. The unconditional models of LCGA were utilized to explore the optimal latent classes of students' development trajectories of syntactic complexity. The conditional models of LCGA were employed to investigate the predictive effect of English proficiency on the optimal latent classes. Results of the unconditional models revealed different latent classes of development trajectories for six indices of syntactic complexity rather than the remaining ones, which offers tentative evidence for the heterogeneity of L2 development trajectories. Results of the conditional models showed that English proficiency did not predict the membership in these latent classes. These results are discussed and implications for L2 instruction are attempted.

    doi:10.1016/j.asw.2024.100877

April 2024

  1. Linguistic factors affecting L1 language evaluation in argumentative essays of students aged 16 to 18 attending secondary education in Greece
    doi:10.1016/j.asw.2024.100844
  2. Exploring the effects of task difficulty and learner variables on performance on picture description writing tasks
    doi:10.1016/j.asw.2024.100827
  3. Book review
    doi:10.1016/j.asw.2024.100837
  4. Writing productivity development in elementary school: A systematic review
    Abstract

    The ability to produce fluent and coherent written text impacts learning and attainments. Valid and reliable assessments of writing are needed to monitor progression, develop goals for writing and identify struggling writers. In order to inform practice and research a systematic review was conducted to investigate which writing productivity measures captured writing development and identified struggling writers in elementary school. Sixty-seven empirical studies were identified for inclusion, appraised, and their data extracted under the themes of writing genre, duration of writing task, use of priming of topic knowledge prior to the writing assessment, use of planning time, writing modality, gender, age of participants and learning difficulties. Total Number of Words and Correct Word Sequences were the most common means of measuring productivity. Productivity varied significantly between genres and durations of writing tasks and was higher in girls than boys. Students with learning difficulties scored significantly lower in writing productivity when compared to typically developing peers. Insufficient research was available to draw conclusions regarding the effects of priming of topic knowledge, planning and modality on writing productivity. Study limitations, links to the assessment of writing and recommended further research are discussed.

    doi:10.1016/j.asw.2024.100834
  5. Comparing trained EFL peer reviewers’ feedback: From claim to reality
    doi:10.1016/j.asw.2024.100836
  6. Book review
    doi:10.1016/j.asw.2024.100812
  7. Establishing analytic score profiles for large-scale L2 writing assessment: The case of the CET-4 writing test
    doi:10.1016/j.asw.2024.100826