Augmenting AI scoring of essays with GPT-generated responses

Mo Zhang; Akshay Badola; Matthew Johnson; Chen Li

doi:10.17239/jowr-2026.17.03.06

Journal of Writing Research Feb 2026 OA PDF

Augmenting AI scoring of essays with GPT-generated responses

Mo Zhang Educational Testing Service ; Akshay Badola Center for Assessment ; Matthew Johnson ; Chen Li Educational Testing Service

Abstract

In this study, we examine the feasibility of augmenting student-written essays with those generated by large language models (LLMs) for scoring essays. We found that with correct instructions, generative AI systems such as GPT-4 and GPT-4o can generate essays similar to those written by students in terms of surface-level linguistic features, although material differences may still exist. Systematic analyses revealed that scoring models trained with synthetic data perform comparably to models trained using student essays, but the performance varies across prompts and the sizes of the model training sample. The augmented models could alleviate large discrepancies between human and AI scores on the subgroup level that may be introduced by a lack of training samples for a particular subgroup or due to inherent biases in LLMs. We also explored an established method – DecompX – on token importance to identify and explain AI predictions. Future research directions and limitations of this study are also discussed.

Journal: Journal of Writing Research
Published: 2026-02-17
DOI: 10.17239/jowr-2026.17.03.06
CompPile
Open Access: OA PDF Diamond
Topics: artificial intelligence
Export: BibTeX RIS

Citation Context

Cited by in this index (0)

No articles in this index cite this work.

References (0)

No references on file for this article.

CrossRef global citation count: 0 View in citation network → Build reading path →

Augmenting AI scoring of essays with GPT-generated responses

Abstract

Citation Context

Cited by in this index (0)

References (0)

Related Articles