Joon Suh Choi

1 article

📚 Search for Choi in CompPile →

Vanderbilt University

Loading profile…

Publication Timeline

Co-Author Network

Research Topics

Assessing Writing Apr 2026 OA PDF

Assessing fairness in finetuned scoring models with demographically restricted training data ↗

Langdon Holmes; Wesley Morris; Scott Crossley; Joon Suh Choi

Abstract

The increasing adoption of automated essay scoring (AES) in high-stakes educational contexts necessitates careful examination of potential biases within the systems. This study investigates how the demographic composition of training data influences fairness in AES systems developed from finetuned large language models (LLMs). Using the PERSUADE corpus of 26,000 student essays, we conducted a systematic analysis using demographically restricted training sets to isolate the impact of training data demographics on LLM-AES performance. Each demographically restricted training set comprised essays written by one racial/ethnic group. Four variants of a Longformer-based AES were developed: one trained on demographically balanced data and three trained on demographically restricted datasets. An initial analysis of the human ratings indicated that demographic factors significantly predict human essay scores (marginal R² = 0.125), a pattern that is paralleled in national writing assessment data. LLM-AES systems trained on demographically restricted data exhibited small systematic biases (marginal R² = 0.043). However, the LLM trained on balanced data showed minimal demographic bias, suggesting that representative training data can effectively prevent amplification of demographic disparities beyond those present in human ratings. These results highlight both the importance and limitations of training data diversity in achieving fair assessment outcomes. • 12.5% of variance in human essay ratings was explained by demographics. • We construct demographically restricted training sets to isolate bias. • Balanced training data minimized LLM-AES bias across demographic groups. • LLM-AES trained on demographically restricted data showed more bias.

assessment artificial intelligence race and writing

doi:10.1016/j.asw.2026.101032

Joon Suh Choi

Publication Timeline

Co-Author Network

Research Topics

Co-Cited With