Pinakes — Rhetoric & Composition

All Journals

1453 articles

– Clear

assessment ×

July 2026

Assessing Writing Jul 2026

LAWE-CL2: Multi-agent LLM-based automated writing evaluation system integrating linguistic features with fine-tuning for Chinese L2 writing assessment ↗

Xuelin Wang; Qihao Yang; Yuxin Hao; Zhijun Wang; Sijia Guo

assessment artificial intelligence multilingual writers

doi:10.1016/j.asw.2026.101051

April 2026

Assessing Writing Apr 2026

Generative artificial intelligence for automated writing evaluation: A systematic review of trends, efficacy, and challenges ↗

Shadi I. Abudalfa; Jessie S. Barrot

assessment artificial intelligence book reviews

doi:10.1016/j.asw.2026.101041
Assessing Writing Apr 2026 OA PDF

Pursuing fair writing assessment: Halo effects in primary school foreign language writing in grade six ↗

Ruth Trüb; Julian Lohmann; Jens Möller; Stefan D. Keller

Abstract

Assessing the writing competence of pupils learning English as a foreign language (EFL) at primary school is associated with specific challenges because of learners’ limited language resources. This study investigates the extent to which characteristics of their texts trigger so-called halo effects. Halo effects are an assessment bias where the quality of one feature unintentionally influences the evaluation of other aspects. The study examines halo effects across nine aspects of text quality (communicative effect, level of detail, coherence, cohesion, complexity of syntax and grammar, correctness of syntax and grammar, vocabulary, orthography and punctuation), based on a random sample of narrative texts from a sixth-grade corpus. 200 pre-service teachers assessed four randomly assigned texts. Halo effects were calculated by comparison to expert ratings using multi-level regression analyses. Results show that orthography and vocabulary were the two main triggers of halo effects. Punctuation also triggered some halo effects, but to a smaller extent. The assessment of communicative effect, complexity and correctness of syntax and grammar was not determined by the corresponding text quality but dominated by other criteria. Results highlight the importance of being aware of halo effects when assessing young EFL learners’ texts and emphasise the need for suitable training measures. • Analysis of halo effects across nine aspects of text quality. • Random sample of narrative texts from a sixth-grade EFL corpus. • Orthography and vocabulary are the two main triggers of halo effects. • Punctuation also triggers halo effects but to a smaller extent. • Halo effects call for awareness and targeted training.

assessment multilingual writers grammar and mechanics

doi:10.1016/j.asw.2026.101036
Assessing Writing Apr 2026 OA PDF

From spelling to content: The influence of spelling quality on text assessment ↗

Frederike Strahl; Jörg Kilian; Jens Möller

assessment grammar and mechanics

doi:10.1016/j.asw.2026.101014
Assessing Writing Apr 2026 OA PDF

How do L2 writing subskills interact hierarchically? Insights from diagnostic classification models ↗

Farshad Effatpanah; Hamdollah Ravand; Mahmoud Abdi Tabari; Yi-Hsin Chen; Olga Kunina-Habenicht

Abstract

This study examined the hierarchical structure among second/foreign language (L2) writing subskills using a Hierarchical Diagnostic Classification Model (HDCM). A pool of 500 essays composed by English as a Foreign Language (EFL) students was assessed by four experienced EFL teachers using the Empirically-derived Descriptor-based Diagnostic (EDD) checklist. Based on a literature review and the expertise of three content experts, several models were developed to reflect various hierarchical interactions among L2 writing subskills, including linear, divergent, convergent, independent, unstructured, mixed, and higher-order. The comparison of the models showed the presence of an unstructured interaction among L2 writing subskills, indicating that content is the foundational subskill for the mastery of vocabulary, grammar, organization, and mechanics. Higher mastery classes were also associated with higher educational levels, greater frequency of English use, and longer exposure to L2. Understanding the hierarchical relationships among L2 writing subskills can improve targeted instructional strategies and assessment practices. • A constrained version of existing DCMs is represented by hierarchical DCMs. • Models were developed to show hierarchical interactions among L2 writing subskills. • An unstructured interaction among L2 writing subskills was identified. • Higher mastery classes were associated with higher educational levels. • The classes were associated with greater English use and longer L2 exposure.

assessment multilingual writers grammar and mechanics

doi:10.1016/j.asw.2026.101029
Assessing Writing Apr 2026

Assessing GenAI-assisted digital multimodal composing: Reconceptualizing a genre-based framework through self-assessment and peer assessment ↗

Yunan Zhang; Zixuan Li

genre theory assessment multimodality

doi:10.1016/j.asw.2026.101017
Assessing Writing Apr 2026 OA PDF

Assessing fairness in finetuned scoring models with demographically restricted training data ↗

Langdon Holmes; Wesley Morris; Scott Crossley; Joon Suh Choi

Abstract

The increasing adoption of automated essay scoring (AES) in high-stakes educational contexts necessitates careful examination of potential biases within the systems. This study investigates how the demographic composition of training data influences fairness in AES systems developed from finetuned large language models (LLMs). Using the PERSUADE corpus of 26,000 student essays, we conducted a systematic analysis using demographically restricted training sets to isolate the impact of training data demographics on LLM-AES performance. Each demographically restricted training set comprised essays written by one racial/ethnic group. Four variants of a Longformer-based AES were developed: one trained on demographically balanced data and three trained on demographically restricted datasets. An initial analysis of the human ratings indicated that demographic factors significantly predict human essay scores (marginal R² = 0.125), a pattern that is paralleled in national writing assessment data. LLM-AES systems trained on demographically restricted data exhibited small systematic biases (marginal R² = 0.043). However, the LLM trained on balanced data showed minimal demographic bias, suggesting that representative training data can effectively prevent amplification of demographic disparities beyond those present in human ratings. These results highlight both the importance and limitations of training data diversity in achieving fair assessment outcomes. • 12.5% of variance in human essay ratings was explained by demographics. • We construct demographically restricted training sets to isolate bias. • Balanced training data minimized LLM-AES bias across demographic groups. • LLM-AES trained on demographically restricted data showed more bias.

assessment artificial intelligence race and writing

doi:10.1016/j.asw.2026.101032
Assessing Writing Apr 2026

Aligning ACTFL writing proficiency guidelines with CEFR descriptors: Insights from Chinese writing assessment ↗

Pei-Jiun Lan; Po-Hsi Chen; Shu-Huei Peng; Chia-Pi Chao; Tsung-An Wang; Wen-Hsun Tseng

assessment

doi:10.1016/j.asw.2026.101033

March 2026

Journal of Technical Writing and Communication Mar 2026

Canon to Code: Rhetorical Rulemaking for Generative AI Content Audits and Governance ↗

Miriam F. Williams

Abstract

This article proposes the Canon to Code (C2C) Auditing Framework for evaluating generative (artificial intelligence) AI output through classical rhetoric, arguing that AI's characteristic failures—guessing instead of knowing, politeness instead of credibility, and confidence instead of judgment—revisit problems that rhetoric has addressed since antiquity. Developed using a rulemaking methodology and drawing on classical rhetorical theory, this framework presents 10 auditing rules that operationalize rhetorical principles into evaluation criteria for AI-generated content, focusing on accuracy, transparency, and accountability. It offers content auditors, technical communicators, and compliance professionals a theoretically grounded method for distinguishing AI output that meets audience needs from output that simulates credibility through pattern matching.

classical rhetoric modern rhetorical theory rhetorical criticism assessment technical communication artificial intelligence

doi:10.1177/00472816261429907
Written Communication Mar 2026

Time, Space, and Tools: A Materio-cognitive Model of Digital Writing Process Development ↗

Matthew Overstreet; Silvia Vaccino-Salvadore; Diana Akhmedjanova

Abstract

This article uses a novel theoretical frame—materio-cognitivism—to explore how digital writing processes change with time and experience. Researchers observed 10 second langauge writers as they completed two research writing tasks—one at the start of their first year of university and one near the end of university. Interviews and screen recording were used to track writing activity. Five key writing strategies were identified. Among the most improved writers, researchers identified a set of shared changes in how writing strategies were deployed. In particular, the most improved writers showed increased ability to sequence subtasks, to arrange digital interfaces, and to combine internal cognitive functions with the affordances of digital tools. These findings suggest what the development of writing processes might look like in digital environments, potentially informing both writing pedagogy and assessment.

writing pedagogy assessment digital rhetoric

doi:10.1177/07410883251410154
IEEE Transactions on Professional Communication Mar 2026

US Hospital Educators' Technology Needs: A Qualitative Study for Developing Action-Oriented Technology ↗

Margaret Webb; Sweta Baniya; Alexa Smith; Nadra Rasberry; Ihudiya Ogbonnaya-Ogburu

Abstract

Background: Hospital educators are designated individuals who provide hospitalized K-12 children with their schooling during the time of their stay. They play a vital role in maintaining educational continuity for hospitalized children, yet their professional information and communication practices remain understudied in US settings. Literature review: We build on literature within technical and professional communication (TPC), specifically scholars who have studied technology and health in understanding US hospital educators' unique technological needs and communication practices within highly regulated healthcare environments. Research questions: How do hospital educators navigate professional communication, adapt teaching practices to meet diverse student needs, and utilize technology in hospital settings? What opportunities exist for artificial-intelligence (AI) integration? Research method: We conducted semistructured interviews with four hospital educators across US hospitals, applying reflexive thematic analysis, informed by Participatory Communication Theory, Sociotechnical Systems Perspectives, and Knowledge Justice. Analysis employed iterative open coding followed by theory-informed thematic development, where communication theory guided the identification of dialogical patterns, systems theory directed attention to sociotechnical interactions, and knowledge justice sensitized us to power dynamics affecting professional knowledge access and sharing. Results/discussion: Findings reveal characteristics of US hospital education contexts in our study: short patient stays, strict security requirements, institutional variability across hospital settings, and emphasis on engagement over assessment. Educators demonstrate remarkable adaptability in coordinating among stakeholders while navigating institutional constraints and developing strategies for rapid assessment and flexible instruction. While educational technologies offer benefits, implementation faces significant challenges regarding security, practical limitations, and offline functionality needs. Conclusion: We propose guideline themes for developing information and communication technologies–including some that use AI–that support hospital educators' professional needs while respecting hospital setting constraints. This research contributes to understanding how technologies can enhance hospital education while highlighting the importance of context-specific design that empowers rather than replaces educator expertise.

modern rhetorical theory teacher development assessment professional writing qualitative research

doi:10.1109/tpc.2026.3658847
IEEE Transactions on Professional Communication Mar 2026

Surveys as UXR: Using Design Thinking to Shape a Survey-Based UX Assessment for Rural Audiences ↗

Kaitlin J. Coyle; Derek G. Ross

Abstract

About the case: While several established user-experience research (UXR) methods can reach far-away users (e.g., remote usability testing), the digital divide makes implementation difficult, especially for rural populations facing barriers to transportation and high-speed internet. Situating the case: Web surveys can eliminate these concerns by providing customization for specific use cases, gathering both qualitative and quantitative data, and combining multiple questionnaires and/or UXR methods within them. Our case study demonstrates an instance where our lab—Auburn University's Lab for Usability, Communication, Interaction, and Accessibility—used advocacy-based HCD and design thinking (DT) to develop a nonstandard UXR Qualtrics web survey to solve our client's wicked problem: designing a usability test for rural audiences unable to travel to our lab while also considering time constraints and technological literacy. Methods: Our survey design followed the Nielsen Norman Group's adaptation of DT, and our process was informed by academic research on: 1. Survey design, question formats, and response bias, 2. Existing user-experience (UX)/usability methods, and 3. Mixed-methods approaches to UXR. Discussion: Our work suggests this tool can potentially serve as the UX testing situation itself, implementing multiple in-person research methods (i.e., heatmapping, user interviews, card sorting) virtually. Conclusion: We conclude with six survey design suggestions and a discussion of how this nonstandard UXR tool can reach underrepresented or vulnerable populations, serving to empower and advocate for users. We suggest that using DT to ideate new UXR methods is a means for UXR practitioners conducting future studies to better address the wicked problems they will face.

assessment technical communication qualitative research literacy studies

doi:10.1109/tpc.2026.3658115
IEEE Transactions on Professional Communication Mar 2026

Bridging the Gap: A Comparative Study of Students’ and LSPs’ Perceptions of Translation Internships ↗

Shuang Liu

Abstract

Background: Both technical professional communication (TPC) and translation training call for a closer academia-industry link to cultivate students’ professional competence and enhance employability. Among the collaborative efforts, the internship serves as a key part in bridging the gap and enhancing students’ work-readiness. Their effectiveness, however, depends on the alignment of expectations among the internship stakeholders. Literature review: While prior studies have examined translation internships, they typically center around either students or language service providers (LSP) in isolation. A significant gap exists in quantitatively comparing the perceptions of these two key stakeholder groups. Research questions: How do students and LSPs differ in their perceptions of internships? What factors contribute to the misalignment in stakeholders’ perceptions from the perspective of university educators and administrators? Methods: This study employed a mixed-methods approach. A survey was administered to translation students and LSP representatives to identify their perception differences across four key dimensions of internships, followed by interviews with university educators and administrators to explore the causes. Results: Quantitative analysis revealed statistically significant discrepancies in 18 of the 44 items. The subsequent qualitative interviews identified four primary factors contributing to these discrepancies: inadequate internship management, curriculum misalignment due to the lack of qualified faculty, emphasis on hard skills over soft skills in evaluation, and pragmatic concerns from both students and employers. Implication: The findings provided recommendations for students, employers, and institutions to improve the effectiveness of internships, which are relevant not only for translation but also for other practice-oriented disciplines like TPC.

collaborative writing assessment professional writing empirical research

doi:10.1109/tpc.2026.3658891
Argumentation Mar 2026 OA PDF

Experimental Insights into the Influence of Logic and Pragmatics on Conditional Argument Evaluation ↗

Ermioni Seremeta; Monique Flecken; Menno Reijven; Jean Wagemans

Abstract

Research on conditional reasoning has long debated whether human rationality is best captured by logicist accounts or by pragmatically oriented approaches such as Relevance Theory, which highlight contextual and communicative factors. While the former predict reliable adherence to logical schemata (e.g., Modus Ponens and Modus Tollens), experimental evidence consistently reveals systematic deviations, such as endorsement of invalid inferences. The latter view attributes such patterns not to irrationality, but to pragmatic expectations that guide interpretation. This study contributes to this debate by examining how logical validity and pragmatic congruency jointly shape the evaluation of conditional arguments. We report two experiments employing a 2 × 2 factorial design. In Experiment 1, participants evaluated conditional syllogisms framed in the standard 'if/then' format. Results showed that pragmatic violations slowed responses and, crucially, facilitated detection of logical invalidity, without hindering performance on valid arguments. Experiment 2 reformulated the same arguments using the Periodic Table of Arguments to replace 'if/then' conditionals with lever-based structures. Here, participants exhibited a generalized tendency to resist conditional inference, resulting in improved rejection of invalid arguments but reduced recognition of valid ones. Across both studies, pragmatic congruency alone did not predict accuracy, but interactions between pragmatic expectations and logical form systematically influenced evaluations. Taken together, the findings suggest that pragmatics does not override logic but modulates its accessibility: violations of pragmatic expectations invite deliberation. At the same time, semantic scaffolding, such as explicit 'if/then' cues, supports deductive reasoning. We propose that natural argumentation depends on this interplay, highlighting the need for situated accounts of logos.

discourse analysis argument assessment

doi:10.1007/s10503-026-09691-6

February 2026

Written Communication Feb 2026

Reading Medium and Communicative Purpose in Writing: Effects on Pausing Behaviour and Text Quality, Controlling for Reading Comprehension and Executive Functions ↗

Ángel Valenzuela; Cristian A. Rojas-Barahona; Ramón D. Castillo; Ladislao Salmerón

Abstract

This study investigated how reading medium (print vs. digital) and communicative purpose (informative vs. persuasive) shape writing processes and outcomes in integrative academic tasks. Eighty-one university students read three source texts in print or digitally and, after random assignment, produced either an informative or persuasive synthesis within a 2×2 between-subjects design. Keystroke logging recorded pausing across three writing stages, indexing planning, translation, and revision. Text quality was scored with holistic rubrics capturing discourse features and integration of sources. Reading medium significantly influenced pausing: students who read in print paused longer during writing, yet medium had no effect on overall text quality. Task purpose mattered: persuasive tasks yielded higher-quality formal writing, whereas scores reflecting level of source integration did not differ. No interaction between reading medium and task purpose emerged. When controlling for reading comprehension, working memory, and planning ability, the main effects of medium and task purpose remained, but period-specific pausing effects were no longer significant. Findings highlight distinct roles for reading medium and task purpose in shaping writing behavior and performance. The results support cautious causal interpretations and suggest that incorporating digital reading and varying task types may enhance academic writing in higher education, informing curriculum design and assessment.

discourse analysis curriculum design revision assessment

doi:10.1177/07410883251409662
Business and Professional Communication Quarterly Feb 2026

From Zero to $ocial Brand: The Guide to Positive LinkedIn Communication ↗

Huu Khang Nguyen; My Khanh Huynh; Huu Phuoc Tran

Abstract

This conceptual article develops a model of positive LinkedIn communication, arguing that responsive, affirming, and authentic interaction—organized into two higher-order behavioral dimensions—strengthens perceived support and trust, thereby shaping professional outcomes (e.g., recruitment, collaboration, and commercial opportunities). By shifting attention from static profile signals to communicative behaviors enacted in posts, comments, and messages, the framework advances testable propositions and specifies mechanisms, boundary conditions, and potential trade-offs that invite empirical evaluation across organizational and cultural contexts.

collaborative writing assessment

doi:10.1177/23294906261419056
Journal of Writing Research Feb 2026 OA PDF

Empirical studies of writing and generative AI: Introduction to the special issue ↗

Chris Anson; Kirsti Cole

Abstract

This special issue of the Journal of Writing Research brings together seven empirical studies of the relationship between writing and generative AI, examining what can be systematically observed and measured about the functioning of generative AI in educational and professional writing contexts. Collectively, the studies demonstrate the necessity and value of methodological pluralism for investigating a complex, rapidly evolving phenomenon. In their contributions, the researchers use experimental comparisons, mixed-methods intervention designs, corpus-based analyses, computational linguistic techniques, and qualitative interpretive approaches. Taken together, these methods enable lines of inquiry that no single approach could sustain: comparisons of AI and human performance in professional writing tasks; analyses of how writers at different ages and levels of expertise engage AI tools; examinations of how assessment systems register and respond to AI-generated prose; and investigations of how human readers interpret texts with ambiguous authorship. By foregrounding both the affordances and limitations of different methodological traditions, the articles present a multifaceted approach to the study of writing and generative AI.

assessment professional writing artificial intelligence

doi:10.17239/jowr-2026.17.03.01
Journal of Writing Research Feb 2026 OA PDF

Using AI to understand students’ self-assessments of their writing ↗

Madeleine Sorapure; Seth Erickson; Sarah Hirsch; Kenny Smith

Abstract

This study focuses on a generative AI approach to facilitate qualitative analysis in Writing Studies research. We gathered 13,336 one-sentence to one-paragraph responses written by 3,334 incoming students in a directed self-placement program administered at a large R1 U.S. university. In these responses, students describe their high school writing experience and college writing expectations. In stage one of the project, we pilot the use of Retrieval-Augmented Generation to expedite the selection of relevant responses for a topic—in this case, students’ positive self-assessments as writers. The selected responses were then compared to a random sample and rated by three faculty with writing expertise. In stage two, these faculty generated codes and themes from a subset of the responses, incorporating ChatGPT-4 through the stages of thematic analysis. Results show that the use of AI expedites and enhances qualitative analysis, but human participation in the process is still essential. We suggest a machine-in-the-loop framework with which Writing Studies researchers can more readily integrate generative AI to study large corpora of student writing.

first-year composition assessment qualitative research artificial intelligence

doi:10.17239/jowr-2026.17.03.07
Written Communication Feb 2026 OA PDF

Stance in REF Submissions: Authorial Positioning in Impact Narratives ↗

Ken Hyland; Feng (Kevin) Jiang

Abstract

The Research Excellence Framework (REF) is the U.K. government’s means of allocating funding to universities based on assessments of the research they produce. Conducted every five years, this exercise now includes not only the ‘quality’ of research but also its real-world ‘impact’. This helps determine the £7.16 billion distributed annually to universities and influences the reputations of institutions and academics. Writers are therefore keen to make the most persuasive argument for their work they can in these submissions through the narrative case studies that the submission requires. In this article, we examine all 6,361 case studies from the last exercise in 2021 to explore the rhetorical presentation of impact through an analysis of authorial stance. We found considerable use of self-mention, hedges, and boosters, with the hard science fields containing statistically significantly more markers and applied disciplines being particularly strong users. The study contributes to our understanding of stance in academic writing and the role of rhetorical persuasion in high-stakes assessment genres.

rhetorical criticism genre theory argument assessment

doi:10.1177/07410883251410160

January 2026

Business and Professional Communication Quarterly Jan 2026

Nontraditional Grading at the Nexus of Business, Communication, and Composition ↗

Michelle Cowan

Abstract

This article explores factors influencing classroom assessment approaches by analyzing survey data from 326 U.S. college instructors teaching business, communication, and composition. Business and communication instructors adopt nontraditional grading methods far less than composition instructors. Departmental culture and disciplinary norms are major influences, along with constraints like class size, time, and technology. The article argues that instructors can and should question departmental grading norms to develop assessment methods that enhance learning in interdisciplinary courses like business communication.

first-year composition teacher development assessment professional writing

doi:10.1177/23294906251399571
Assessing Writing Jan 2026 OA PDF

The effects of online resource use on L2 learners’ computer-mediated writing processes and written products ↗

Honglan Wang; Jookyoung Jung

Abstract

While previous studies on online resource use in L2 writing have focused on the overall writing quality, limited attention has been paid to its effects on linguistic complexity and real-time writing processes. Addressing this gap, the present study explored how online resource use influences both the processes and products of L2 writing. Forty-nine intermediate L2 learners completed two computer-mediated argumentative writing tasks, either with or without the use of online resources. Writing behaviors were captured via keystroke logging and screen recording, and analyzed for search activity, fluency, pausing, and revision quantity. Cognitive processes were examined through stimulated recall interviews, and written products were evaluated for both quality and linguistic complexity. The results showed that participants spent an average of 14 % of task time using online resources, with considerable individual variation. Mixed-effects modeling revealed that resource use facilitated the production of more sophisticated words, with marginal influence on writing quality or syntactic complexity. Resource use was also associated with longer between-word pauses, fewer within-word pauses, and reduced revisions. These findings highlight the potential of online resource use to enhance the authenticity of L2 writing assessment tasks without compromising test validity, while encouraging the use of more advanced vocabulary in writing. • Learners spent 14 % of the total writing task time using online resources. • Online resource use had no significant impact on L2 writing quality. • Online resource use improved lexical sophistication, not syntactic complexity. • Online resource use reduced within-word pauses and aided spelling retrieval. • Online resource use led to fewer revisions but did not affect fluency.

revision argument assessment digital rhetoric multilingual writers grammar and mechanics affect and writing

doi:10.1016/j.asw.2025.100994
Assessing Writing Jan 2026

Verb-centric or balanced?: An NLP-based assessment of word class contributions to L2 writing proficiency ↗

Hyunwoo Kim; Haerim Hwang

assessment multilingual writers

doi:10.1016/j.asw.2025.100997
Assessing Writing Jan 2026 OA PDF

Generative artificial intelligence for automated essay scoring: Exploring teacher agency through an ecological perspective ↗

Jessie S. Barrot

Abstract

Generative artificial intelligence (AI) is increasingly used in writing assessment, particularly for automated essay scoring (AES) and for generating formative feedback within automated writing evaluation (AWE). While AI-driven AES enhances efficiency and consistency, concerns regarding accuracy, bias, and ethical implications raise critical questions about its role in assessment. This paper examines the impact of generative AI on teacher agency through an ecological perspective, which considers agency as shaped by personal, institutional, and sociocultural factors. The analysis highlights the need for teachers to critically mediate AI-generated scores and feedback to align them with pedagogical goals, ensuring AI functions as an assistive tool rather than a determinant of assessment outcomes. Although AI can streamline assessment, over-reliance risks diminishing teachers’ evaluative expertise and reinforcing biases embedded in AI systems. Ethical concerns, including transparency, data privacy, and fairness, further complicate its adoption. To address these challenges, this paper proposes a framework for responsible AI integration that prioritizes bias mitigation, data security, and teacher-driven decision-making. The discussion concludes with pedagogical implications and directions for future research on AI-assisted writing assessment. • Teachers can actively mediate AI-generated scores to maintain agency. • Dependence on AES may weaken teachers’ evaluative skills. • Bias, data privacy, and AI opacity can undermine teachers’ decision-making. • AI literacy and hybrid assessment models can promote teacher autonomy. • A framework for protecting teacher agency in generative AI–based AWE is presented.

writing pedagogy teacher development assessment artificial intelligence literacy studies

doi:10.1016/j.asw.2025.100990
Assessing Writing Jan 2026 OA PDF

The relation between linguistic accuracy and scoring of Swedish EFL students’ writing during a high-stakes exam ↗

Christian Holmberg Sjöling

Abstract

This paper examines the effect of linguistic accuracy (e.g., the lack of form, grammatical, and lexical errors) on scoring during the high-stakes national test of English in Swedish upper secondary school. Teachers are expected to score their own students’ texts with the help of assessment instructions containing benchmark texts (i.e., texts representing different score bands). The assessment instructions and the score bands provided to guide scoring are not explicit about how accuracy should influence scores. Two research questions were answered: As measured by ordinal regression, to what extent does linguistic accuracy predict rater scores? Do the texts scored by teachers reflect the graded example texts in terms of how linguistic accuracy predicts scores? The results revealed, amongst other things, that overall frequency of errors in texts significantly predicted scores as the model explained approximately 58 % of the variance in the outcome variable according to Nagelkerke’s pseudo R-squared. Accuracy also had a similar effect on scores in texts rated by teachers as in the benchmark texts. In relation to the findings, it was concluded that accuracy may have more of an impact on scores than constructs that are more explicit components of the score bands such as lexical complexity.

assessment multilingual writers

doi:10.1016/j.asw.2025.100995
Assessing Writing Jan 2026 OA PDF

Extracting interpretable writing traits from a large language model ↗

Paul Deane; Andrew Hoang

Abstract

Large language models (LLMs) are increasingly used to support automated writing evaluation (AWE), both for purposes of scoring and feedback. However, LLMs present challenges to interpretability, making it hard to evaluate the construct validity of scoring and feedback models. BIOT (best interpretable orthogonal transformations) is a new method of analysis that makes dimensions of an embedding interpretable by aligning them with external predictors. It was originally developed to improve the interpretability of multidimensional scaling models. However, This paper shows that BIOT can be used to align LLM embeddings with an interpretable writing trait model developed using multidimensional analysis of classical NLP features to measure latent dimensions of writing style and writing quality. This makes it possible to determine whether an AWE model built using an LLM is aligned with known (and construct-relevant) dimensions of textual variation, supporting construct validity. Specifically, we examine the alignment between the hidden layers of deBERTA, a small LLM that has been shown to be useful for a variety of natural language processing applications, and a writing trait model developed through factor analysis of classical features used in existing AWE models. Specific dimensions of transformed deBERTA layers are strongly correlated with these classical factors. When the transformation matrix derived using BIOT is applied to token vectors, it is also possible to visualize which tokens in the original text contributed to high or low scores on a specific dimension. • Large language models (LLMs) are increasingly used to support automated writing evaluate (AWE). • LLMs present challenges to interpretability, making it hard to evaluate construct validity of scoring and feedback models. • BIOT is a new interpretation method that aligns embedding dimensions with external predictors. • Specifically, BIOT can be used to align LLM embeddings with classical NLP measures of aspects of style and writing quality. • This demonstrates a general method to determine whether an LLM latently represents construct-relevant dimensions.

assessment artificial intelligence

doi:10.1016/j.asw.2025.101011
Assessing Writing Jan 2026 OA PDF

How reliable and valid is peer evaluation in adolescents’ L2 argumentative writing? ↗

Albert W. Li; Steve Graham

Abstract

Peer evaluation is widely recognized for its educational benefits; however, its reliability and validity, particularly among adolescent second-language (L2) writers at the early stages of English language and literacy development, remain insufficiently explored. This explanatory sequential mixed-methods study investigated the reliability and validity of peer evaluation in English argumentative writing among 35 Grade 10 and 37 Grade 12 students from a public high school in Beijing, China. Twelve of the participating students (six at each grade) were interviewed about the validity, reliability, and value of peer evaluation. The findings indicated that peer evaluations demonstrated high levels of reliability and validity, with peer-assessed writing scores closely aligning with inter-teacher assessments. Notably, variations were observed among Grade 10 students, particularly in the evaluation of lower-order writing skills, such as grammar and vocabulary, which exhibited reduced validity. These results underscore the potential of peer evaluation in assessing higher-order content-level writing across varying levels of L2 English writing proficiency. The study also highlights areas where adolescent L2 writers may require additional support to enhance the effectiveness of peer evaluation practices in English argumentative writing. Implications for improving English argumentative writing instruction and refining peer evaluation strategies in high school L2 English classrooms are discussed. • Peer evaluation shows high reliability, similar to inter-teacher rating. • Peer evaluation works well for higher-order skills in L2 argumentative writing. • 10th graders struggled with evaluating lower-order skills like grammar. • 12th graders evaluate lower- and higher-order skills with greater validity than 10th graders.

writing pedagogy teacher development argument assessment multilingual writers grammar and mechanics literacy studies

doi:10.1016/j.asw.2025.100992
Assessing Writing Jan 2026 OA PDF

Assessing the effects of task complexity on cognitive demands in L2 writing ↗

Na Tao; Ying Wang

Abstract

The assessment of task-generated cognitive demands has been receiving increasing attention in task complexity research. However, scant attention has been paid to assessing cognitive demands when task complexity is manipulated along both resource-directing and resource-dispersing dimensions. To address this gap, the present study aimed to investigate the relative effects of reasoning demands and prior knowledge on cognitive demands in L2 writing. Eighty-eight EFL students completed two letter-writing tasks with varying reasoning demands under one of two conditions, that is, either with prior knowledge available or without prior knowledge available. Cognitive demands were assessed by the post-task questionnaire, the dual-task method and the open-ended questions. The results revealed that reasoning demands and prior knowledge were strong determinants of cognitive demands, which provided empirical evidence for Robinson’s Cognition Hypothesis. Moreover, the post-task questionnaire, the dual-task method and open-ended questions were found to assess distinct aspects of cognitive demands, which highlighted the importance of data triangulation in exploring task complexity effects. The study provides language teachers and assessors with implications for task design and implementation. • How reasoning demands and prior knowledge affect cognitive demands was underexplored. • Cognitive demands were assessed by both quantitative and qualitative methods. • Findings supported some assumptions underlying Robinson’s framework. • The independent measures assessed distinct aspects of cognitive demands.

assessment qualitative research multilingual writers affect and writing

doi:10.1016/j.asw.2025.100998
Assessing Writing Jan 2026 OA PDF

Assessing the effects of explicit coherence instruction on EFL students’ integrated writing performance ↗

Xi Li; Mo Chen

Abstract

As a key attribute of effective writing, coherence remains challenging to teach in language classrooms, with traditional writing instruction frequently overlooking coherence in favor of discrete, rule-based features. This mixed-methods study investigates the effectiveness of explicit coherence instruction on English-as-a-Foreign-Language (EFL) students’ performance on integrated writing tasks. The study employed a controlled experimental design with 64 upper-intermediate-level undergraduate students at a Chinese university, drawing on Hasan’s Cohesive Harmony theory as the theoretical framework. Half of the participants (n = 32) in the experimental group received explicit instruction on coherence with a focus on cohesive chains and cohesive devices in integrated writing, while the control group (n = 32) received standard paraphrasing instruction. Quantitative analysis revealed that the experimental group showed significant improvements in coherence scores and multiple cohesive chain measures. Qualitative discourse analysis of six students’ writing samples from the experimental group demonstrated varying levels of improvement in writing coherence, with high-performing students showing better use of identity chains and pronoun references. The findings revealed that explicit instruction on coherence significantly improved students’ performance in creating coherent integrated writing, particularly through the development of cohesive chains and appropriate use of cohesive devices. This study underscores the pedagogical value of teaching coherence to enhance writing quality and provides concrete strategies for developing more effective teaching approaches for integrated writing tasks in EFL contexts. • The study examined 64 Chinese EFL students using mixed-methods experimental design. • Cohesive Harmony theory served as the framework for assessing writing coherence. • Explicit instruction significantly improved coherence in integrated writing tasks. • High-performing students demonstrated superior identity chain development.

discourse analysis writing pedagogy graduate education teacher development assessment empirical research multilingual writers

doi:10.1016/j.asw.2026.101019
Journal of Writing Analytics Jan 2026 OA PDF

A Tribute to Robert J. Mislevy Part 1: Mapping the Skills of Tomorrow--Principled Assessment of Literacy and Numeracy Skills Embedded in U.S. Workplace Contexts ↗

Maria Elena Oliveri; Aria Immanuel

assessment literacy studies

doi:10.37514/jwa-j.2026.8.1.09
Pedagogy Jan 2026

“Not a Detour from Rigor” ↗

Krysten Stein; Kishonna Gray

Abstract

Abstract This article argues that care — especially care grounded in Black feminist traditions — is not an affective supplement to teaching but rather the radical foundation of liberatory pedagogy. Amid rising attacks on critical education and the austerity logics of the neoliberal university, the authors theorize care as infrastructure, method, and resistance. Drawing from the work of bell hooks, Audre Lorde, Patricia Hill Collins, Mia Mingus, and Leah Lakshmi Piepzna-Samarasinha, they offer a framework for care-centered teaching that foregrounds mutuality, trust, and collective accountability. Through vignettes, student reflections, and practices such as trauma-informed design, mutual aid, and collaborative assessment, the article demonstrates how care fosters relational transformation and deep intellectual engagement. It also interrogates the structural devaluation of care labor, particularly for women and faculty of color, and challenges dominant educational paradigms that equate rigor with detachment. As one student reflected, “You believed me when I said I needed more time, without asking for proof. That made me want to do the work even more.” Drawing from their institutional experiences, the authors position teaching as a form of organizing — an insurgent, relational practice that refuses extractive academic norms while building collective conditions for educational and institutional transformation.

writing pedagogy teacher development collaborative writing assessment gender and writing

doi:10.1215/15314200-12097258
Written Communication Jan 2026

Modeling Writing Processes and Predicting Text Quality in Technical Communication ↗

Zhijun Gao; Lin Dong; Jiangying Wang

Abstract

Combining keystroke logging, screen recordings, interviews, and text quality assessment in two mixed-methods studies with technical writers, this research (1) identifies defining variables of technical writing processes and (2) examines their correlations with and predictive power for text quality. Study 1, an exploratory investigation with 10 participants, identified 22 distinct writing behaviors under six categories of information searching, information reusing, content shaping, organization structuring, language styling, and layout designing during planning, translating, and reviewing sessions. These behavioral variables, together with time-related variables, were subsequently analyzed as “process indicators” in a comparative experiment with 43 participants across experience levels. Results of Study 2 revealed significant differences among experience levels in writing speed, planning duration, pause, search, reuse, content shaping, and structuring. Detailed planning and systematic content/structure editing were strongly associated with higher-quality texts. Building on these findings, we propose a process model of technical writing, explain its correlations with writing score, and depict process profiles of different experience levels. We also highlight the importance of information processing skills in enhancing writing efficiency, offering empirical guidance for technical writing instruction and professional training.

writing pedagogy assessment technical communication

doi:10.1177/07410883251372212

December 2025

Business and Professional Communication Quarterly Dec 2025

Design Thinking in Business and Professional Communication Pedagogy: A Review of Pedagogical Studies, 2014–2024 ↗

Jason Tham

Abstract

This review analyzes 59 studies from 2014 to 2024 examining design thinking integration in professional communication pedagogy across eight disciplinary journals. Design thinking has evolved from experimental use to systematic pedagogical approaches, with assignment-level integration proving most viable for educators. Empathy interviews and user research bridge design thinking principles with communication pedagogy’s audience awareness focus. Students show enhanced empathy, improved collaboration, and increased creative confidence with high motivation levels. Implementation challenges include time constraints, student resistance to ambiguity, and assessment difficulties. The study recommends scaffolded introduction, integration with existing content, and institutional support for desirable implementation in business and professional communication pedagogy.

writing pedagogy collaborative writing assessment professional writing book reviews

doi:10.1177/23294906251397613
Argumentation Dec 2025

An Analogy-Based Approach to Argument Evaluation ↗

Szymon Makuła

assessment

doi:10.1007/s10503-025-09655-2
Computers and Composition Digital Press Dec 2025 web OA PDF

The Black-Boxed Ideology of Automated Writing Evaluation Software ↗

Nupoor Ranade; Douglas Eyman

assessment artificial intelligence
Business and Professional Communication Quarterly Dec 2025

Rethinking Teacher-Student Communication in the AI Era ↗

Huu Phuoc Tran

Abstract

This article examines how artificial intelligence is transforming instructor-student communication and student evaluation in higher education. By comparing traditional and AI-mediated communication practices, the study synthesizes current literature on opportunities, challenges, and ethical considerations. The analysis highlights the need for digital literacy, emotionally intelligent AI tools, and balanced pedagogical strategies. Practical and theoretical propositions are provided to guide educators in leveraging AI while preserving human-centered teaching values.

writing pedagogy teacher development assessment digital rhetoric artificial intelligence literacy studies

doi:10.1177/23294906251356672
College Composition and Communication Dec 2025

Money Machine: Gig Writing, Automation, and Labor Troubles in Composition ↗

James Rushing Daniel

Abstract

Taking stock of the diminishing material conditions faced by contemporary writers broadly conceived, this article (re)frames writing as a site and a practice of exploited labor. Arguing that writing scholars have often avoided interrogating writing’s links to labor, particularly with respect to declining working conditions and the appropriation of value from workers, I draw attention to the pervasive crisis of writing’s devaluation under late capitalism. To evidence this assessment, I apply political economist Harry Braverman’s conception of the “progressive alienation of the process of production”—the notion that labor is increasingly eroded through capitalism’s advancement—to the scene of contemporary gig writing, specifically Amazon’s microtask platform Mechanical Turk (MTurk). MTurk, I maintain, offers a paradigmatic illustration of contemporary writers’ material exploitation, both for its efforts to de-skill writers and for its conscription of writers to advance their own exploitation by employing them to train generative AI.

assessment labor and working conditions artificial intelligence

doi:10.58680/ccc2025772243

October 2025

Argumentation Oct 2025 OA PDF

Finding the Missing Link: An Algorithmic Approach to Reconstructing Enthymemes ↗

Ameer Saadat-Yazdi; Jean H. M. Wagemans

Abstract

Abstract Enthymemes are arguments that are not fully articulated, often omitting a connection between premise and conclusion but sometimes also other information that is crucial for their interpretation. This implicitness poses challenges for the analysis and evaluation of argumentative discourse. We use the concept of “argument form” as employed in the argument classification framework of the Periodic Table of Arguments to address this issue. By developing an algorithmic procedure grounded in this concept, we provide a method for explicating missing statements and connections condensed in enthymemes. Our approach contributes to understanding the pragmatics of argumentation, as it offers a formal framework for analysing how the interpretation of implicit elements in argumentation arises from apparent non-sequiturs. The algorithmic procedure we developed can function as a guideline for human annotation of argumentative discourse and is also suitable for implementation in (AI-assisted) annotation software for argument mining.

discourse analysis argument assessment

doi:10.1007/s10503-025-09682-z
Argumentation Oct 2025 OA PDF

Practical Argumentation and Rhetorical Structure Theory ↗

Nancy Green

Abstract

This paper investigates the relationship between practical argumentation (PA) and Rhetorical Structure Theory (RST). PA is argumentation providing justification for an agent’s action. PA has been described in terms of a three-level structure composed of practical, evaluative, and classificatory argumentation schemes. RST is a linguistic theory that models the hierarchical structure of monological discourse in terms of discourse coherence relations. RST’s Motivation relation is intended to increase an agent’s inclination to perform some action. Our investigative approach was to analyze argumentation schemes of PA in examples of RST involving Motivation and to analyze RST structure for texts that have been used as examples of PA. The results of the investigation show uses, not only of Motivation, but also RST’s Antithesis, Concession, Evaluation, and Solutionhood. In some cases the RST analysis reflects the layered composition of argumentation schemes of PA.

rhetorical criticism discourse analysis argument assessment

doi:10.1007/s10503-025-09680-1
Composition Forum Oct 2025 RSS OA PDF

Collaborative and Equitable Assessment: Graduate Student Responses to Co-Creating Feedback Guidelines in a Graduate Composition Pedagogy Course ↗

Abstract

Megan McIntyre Abstract In response to a growing awareness of the oppressive foundations of educational institutions, literacy educators have turned to antiracist, culturally responsive (Alim and Paris; Paris), and equitable teaching and assessment practices to combat the inequities (colonialism, racism, sexism, homophobia, ableism, etc.) on which our institutions are built. According to scholars including Geneva […]

writing pedagogy graduate education teacher development collaborative writing assessment literacy studies race and writing gender and writing disability studies editorial matter
Composition Forum Oct 2025 RSS OA PDF

Sustaining Collective Actions: Program Assessment During Transitional Moments ↗

Abstract

Shane A. Wood, Nikolas Gardiakos, Matthew Bryan, Natalie Madruga, Pamela Baker, Joel Schneier, Joel Bergholtz, Emily Proulx, Vee Kennedy, Ricky Finch, Mya Poe, Norbert Elliot, and Sherry Rankins-Robertson Abstract The University of Central Florida’s First-Year Composition Program has sustained its commitment to values-based sustainable development despite a series of significant changes from 2020–2025. In this […]

first-year composition assessment
Assessing Writing Oct 2025

Assessing writing practices in higher education: Characterizing self-reported practices and identifying their determinants ↗

Dyanne Escorcia; Kiara Campo; Gabriela Navarro; Christine Ros

assessment

doi:10.1016/j.asw.2025.100976
Assessing Writing Oct 2025

Response time for English learners on large-scale writing assessments ↗

Catherine Welch; Stephen Dunbar; Jeongmin Ji; Annette Vernon; Junhee Park

assessment

doi:10.1016/j.asw.2025.100979
Assessing Writing Oct 2025

Comparing GPT-based approaches in automated writing evaluation ↗

Yingying Liu; Xiaofei Lu; Huilei Qi

assessment artificial intelligence

doi:10.1016/j.asw.2025.100961
Assessing Writing Oct 2025

Assessing L2 writing formality using syntactic complexity indices: A fuzzy evaluation approach ↗

Zhiyun Huang; Guangyao Chen; Zhanhao Jiang

assessment multilingual writers grammar and mechanics

doi:10.1016/j.asw.2025.100973
Assessing Writing Oct 2025 OA PDF

Judgment accuracy in primary school EFL writing assessment: Do text characteristics matter? ↗

Ruth Trüb; Jens Möller; Julian Lohmann; Thorben Jansen; Stefan D. Keller

Abstract

Assessing the writing competence of pupils learning English as a foreign language (EFL) at primary school is challenging. This study aimed at examining a largely unexplored topic, namely the role of text characteristics in writing assessment, and analysed judgment accuracy differentiated by nine aspects of text quality (communicative effect, level of detail, coherence, cohesion, complexity of syntax and grammar, correctness of syntax and grammar, vocabulary, orthography and punctuation). Two hundred pre-service teachers assessed four randomly assigned texts from learners in grade six. Their assessment was compared to the existing ratings of two experts from a previous study. We found a relative judgment accuracy between r = .34 and .60 for the nine assessment criteria, with vocabulary being assessed significantly more accurately than almost all other criteria. Orthography, complexity and correctness of syntax and grammar and punctuation were rated with significantly more accuracy than cohesion, level of detail, communicative effect and coherence. The pre-service teachers assessed most criteria more strictly and with higher variability than the experts. The results suggest that teacher education should offer pre-service teachers concrete opportunities to practise writing assessment, implement activities to strengthen the assessment of content- and structure-related criteria, and help them adjust their assessment rigour. • Judgment accuracy in the assessment of primary school EFL learners’ texts. • Relative judgment accuracy between r = .34 and .60 for the different criteria. • Significant differences in relative judgment accuracy between assessment criteria. • Linguistic text qualities are assessed with more accuracy than content- and structure-related aspects. • Pre-service teachers are more rigorous and heterogeneous in rating than experts.

teacher development assessment multilingual writers grammar and mechanics

doi:10.1016/j.asw.2025.100957
Assessing Writing Oct 2025 OA PDF

Exploring the scoring validity of holistic and dimension-based Comparative Judgements of young learners’ EFL writing ↗

Rebecca Sickinger; John Pill; Tineke Brunfaut

Abstract

Comparative Judgement (CJ) is a pairwise comparison evaluation method, typically conducted online. Multiple judges each compare the quality of a series of paired performances and, from their decisions, a rank order is constructed and scores calculated. Research across different educational contexts supports CJ’s reliability for evaluating written performances, permitting more precise scoring of scripts and for dimension-focused evaluation. However, scant insights are available about the basis of judges’ evaluations. This issue is important because argument-based approaches to validation (common in the field of language testing and adopted in this study) require evidence to support claims about how scores are appropriate for test purpose. Therefore, we investigate the scoring validity of CJ, both when used holistically (the standard application of CJ) and when evaluating scripts by individual criteria (termed dimensions in the research context). Twenty-seven judges evaluated 300 scripts addressing two writing task types in a national English as a Foreign Language examination for young learners in Austria. Judges reported via questionnaires what they had focused on while judging. Subsequently, eight judges provided think-aloud data while evaluating 157 scripts, offering further insight into the writing features they considered and their decision-making during CJ. Findings showed that while most judges adapted a decision-making process similar to traditional rating methods, some adapted their method to accommodate the nature of CJ evaluation. Furthermore, results indicated that the judges considered construct-relevant criteria when using CJ, both holistically and by dimension, thus offering support to an argument for the appropriateness of using CJ in this context. • Comparative Judgement can offer an alternative to analytic rating of EFL writing. • Judges with teaching or rating experience largely focus on relevant text features. • Some judges adopt a decision-making process that appears well suited to CJ. • Dimension-based CJ has the potential to provide richer feedback than holistic CJ.

teacher development assessment multilingual writers

doi:10.1016/j.asw.2025.100986
Assessing Writing Oct 2025 OA PDF

Which gender provides more specific peer feedback? Gender and assessment training’s effects on peer feedback specificity and intrapersonal factors ↗

José Carlos G. Ocampo; Ernesto Panadero; David Zamorano; Iván Sánchez-Iglesias

Abstract

This study investigated the effects of assessor gender (male vs. female), fictitious assessee gender (male vs. female), and assessment training (with vs. without) on peer feedback specificity (i.e. localisation and focus) and intrapersonal factors (i.e. trust in the self as an assessor and discomfort). This study involved 240 undergraduate psychology students (nMen=120, nWomen=120), with half receiving assessment training and the other half receiving the task instructions. Participants were divided into eight subgroups based on training condition and their self-reported gender to provide peer feedback to three writing samples (poor, average, excellent quality) by fictitious male or female peer assessees in Eduflow. A total of 3017 peer feedback segments were analysed, revealing that trained or untrained male and female assessors were comparable in most peer feedback specificity categories when assessing fictitious male or female assessees. Nonetheless, we also found that female assessors excelled in certain categories of peer feedback specificity, while male assessors also demonstrated competencies in other categories. Results also showed that assessors who received assessment training provided localised peer feedback in all the writing samples. Finally, gender and training did not affect participants’ trust in their abilities and (dis)comfort when providing peer feedback.

assessment gender and writing affect and writing

doi:10.1016/j.asw.2025.100987
Assessing Writing Oct 2025

Integrating move analysis and sentence reconstruction in automated writing evaluation for L2 academic writers ↗

Bo-Ren Mau; Hui-Hsien Feng

assessment artificial intelligence

doi:10.1016/j.asw.2025.100984
Assessing Writing Oct 2025

Exploring the cross-lingual influence of linguistic complexity in second language writing assessment ↗

Sara Geremia; Thomas Gaillat; Nicolas Ballier; Andrew J. Simpkin

assessment multilingual writers

doi:10.1016/j.asw.2025.100951
Assessing Writing Oct 2025

Predictive validity evidence for a no-stakes, untimed, machine-scored diagnostic writing assessment ↗

Elie ChingYen Yu; Oxana Rosca; Heidi L. Andrade; Angela M. Lui; Jason Bryer

assessment

doi:10.1016/j.asw.2025.100978