Augmenting AI scoring of essays with GPT-generated responses
Abstract
In this study, we examine the feasibility of augmenting student-written essays with those generated by large language models (LLMs) for scoring essays. We found that with correct instructions, generative AI systems such as GPT-4 and GPT-4o can generate essays similar to those written by students in terms of surface-level linguistic features, although material differences may still exist. Systematic analyses revealed that scoring models trained with synthetic data perform comparably to models trained using student essays, but the performance varies across prompts and the sizes of the model training sample. The augmented models could alleviate large discrepancies between human and AI scores on the subgroup level that may be introduced by a lack of training samples for a particular subgroup or due to inherent biases in LLMs. We also explored an established method – DecompX – on token importance to identify and explain AI predictions. Future research directions and limitations of this study are also discussed.
- Journal
- Journal of Writing Research
- Published
- 2026-02-17
- DOI
- 10.17239/jowr-2026.17.03.06
- CompPile
- Open Access
- OA PDF Diamond
- Topics
- Export
- BibTeX RIS
Citation Context
Cited by in this index (0)
No articles in this index cite this work.
References (0)
No references on file for this article.
Related Articles
-
Assessing Writing Jul 2026Accuracy and fairness of generative AI in automated essay scoring: Comparing GPT-4o, feature-based models, and human raters ↗Yue Huang; Corey Palermo; Joshua Wilson
-
Assessing Writing Jul 2026Educator perspectives on automated writing scoring and feedback for young language learners: Applying a fairness and justice lens ↗Jieun Kim; Mark Chapman; Lynn Shafer Willner; Jason A. Kemp; Ahyoung Alicia Kim
-
Assessing Writing Jul 2026LAWE-CL2: Multi-agent LLM-based automated writing evaluation system integrating linguistic features with fine-tuning for Chinese L2 writing assessment ↗Xuelin Wang; Qihao Yang; Yuxin Hao; Zhijun Wang; Sijia Guo
-
Assessing Writing Jul 2026Anchor is the key: Toward accessible automated essay scoring with large language model through prompting ↗Jaeyoon Choi; Tamara Tate; Mark Warschauer
-
Assessing Writing Jul 2026Investigating the impact of ChatGPT-assisted self-assessment on college students' writing development: Insights from diverse linguistic backgrounds ↗Hamidreza Moeiniasl