Paul Deane
3 articles-
Abstract
We conducted a post hoc analysis of 771 students’ argumentative writing plans and essays in the Criterion ® database, a digital writing tool, to explore the relations among plan features, essay quality, and writing traits. Students in the study were in Grades 5 to 10 from 68 schools. We found that older students produced writing plans that received higher scores and demonstrated greater genre-specific knowledge than younger students, but regardless of their grade, most students did not consider alternative perspectives or rebut counterarguments in their writing plans. We also found that students’ choice of plan templates was associated with the scores of their plans. Further, factor analysis showed that six of the seven plan feature scores hung together in a single factor (Factor 1) and correlated with multiple trait scores (Factor 2), accounting for most of the shared variance connecting plan scores with writing traits. The “both sides” plan feature loaded on a different factor by its own, suggesting that considering different perspectives is a challenging skill that students may need extra support to develop.
-
Abstract
Large language models (LLMs) are increasingly used to support automated writing evaluation (AWE), both for purposes of scoring and feedback. However, LLMs present challenges to interpretability, making it hard to evaluate the construct validity of scoring and feedback models. BIOT (best interpretable orthogonal transformations) is a new method of analysis that makes dimensions of an embedding interpretable by aligning them with external predictors. It was originally developed to improve the interpretability of multidimensional scaling models. However, This paper shows that BIOT can be used to align LLM embeddings with an interpretable writing trait model developed using multidimensional analysis of classical NLP features to measure latent dimensions of writing style and writing quality. This makes it possible to determine whether an AWE model built using an LLM is aligned with known (and construct-relevant) dimensions of textual variation, supporting construct validity. Specifically, we examine the alignment between the hidden layers of deBERTA, a small LLM that has been shown to be useful for a variety of natural language processing applications, and a writing trait model developed through factor analysis of classical features used in existing AWE models. Specific dimensions of transformed deBERTA layers are strongly correlated with these classical factors. When the transformation matrix derived using BIOT is applied to token vectors, it is also possible to visualize which tokens in the original text contributed to high or low scores on a specific dimension. • Large language models (LLMs) are increasingly used to support automated writing evaluate (AWE). • LLMs present challenges to interpretability, making it hard to evaluate construct validity of scoring and feedback models. • BIOT is a new interpretation method that aligns embedding dimensions with external predictors. • Specifically, BIOT can be used to align LLM embeddings with classical NLP measures of aspects of style and writing quality. • This demonstrates a general method to determine whether an LLM latently represents construct-relevant dimensions.