Extracting interpretable writing traits from a large language model

Paul Deane; Andrew Hoang

doi:10.1016/j.asw.2025.101011

Assessing Writing Jan 2026 Open Access

Extracting interpretable writing traits from a large language model

Paul Deane Educational Testing Service ; Andrew Hoang Educational Testing Service

Abstract

Large language models (LLMs) are increasingly used to support automated writing evaluation (AWE), both for purposes of scoring and feedback. However, LLMs present challenges to interpretability, making it hard to evaluate the construct validity of scoring and feedback models. BIOT (best interpretable orthogonal transformations) is a new method of analysis that makes dimensions of an embedding interpretable by aligning them with external predictors. It was originally developed to improve the interpretability of multidimensional scaling models. However, This paper shows that BIOT can be used to align LLM embeddings with an interpretable writing trait model developed using multidimensional analysis of classical NLP features to measure latent dimensions of writing style and writing quality. This makes it possible to determine whether an AWE model built using an LLM is aligned with known (and construct-relevant) dimensions of textual variation, supporting construct validity. Specifically, we examine the alignment between the hidden layers of deBERTA, a small LLM that has been shown to be useful for a variety of natural language processing applications, and a writing trait model developed through factor analysis of classical features used in existing AWE models. Specific dimensions of transformed deBERTA layers are strongly correlated with these classical factors. When the transformation matrix derived using BIOT is applied to token vectors, it is also possible to visualize which tokens in the original text contributed to high or low scores on a specific dimension. • Large language models (LLMs) are increasingly used to support automated writing evaluate (AWE). • LLMs present challenges to interpretability, making it hard to evaluate construct validity of scoring and feedback models. • BIOT is a new interpretation method that aligns embedding dimensions with external predictors. • Specifically, BIOT can be used to align LLM embeddings with classical NLP measures of aspects of style and writing quality. • This demonstrates a general method to determine whether an LLM latently represents construct-relevant dimensions.

Journal: Assessing Writing
Published: 2026-01-01
DOI: 10.1016/j.asw.2025.101011
CompPile: Search in CompPile ↗
Open Access: OA PDF Hybrid
Topics: assessment artificial intelligence
Export: BibTeX RIS

Citation Context

Cited by in this index (0)

No articles in this index cite this work.

References (56) · 2 in this index

American Educational Research Association (AERA), American Psychological Association (APA), & National Counci…
Attali (2006)

Automated essay scoring with e-rater® V. 2. The Journal of Technology

Learning and Assessment
Attali (2008)

A developmental writing scale

ETS RR-08-19 ETS Research Report Series
Attali (2015)

Automated Trait Scores for TOEFL® Writing Tasks

ETS RR-15-15 ETS Research Report Series
Attali (2015)

Automated trait scores for" GRE"® writing tasks

Research Report ETS RR-15-15 ETS Research Report Series

Show all 56 →

Bhardwaj, A., Di, W., & Wei, J. (2018). Deep Learning Essentials: Your hands-on guide to the fundamentals of …
Bibal (2021)

BIOT: explaining multidimensional nonlinear MDS embeddings using the best interpretable o…

Neurocomputing ↗
Biber (1988)

Variation across Speech and Writing
Biber (1989)

A typology of English texts

Linguistics ↗
Biber, D. (2006). University language: a corpus-based study of spoken and written registers. Benjamins.

↗
Biber (2012)

Register as a predictor of linguistic variation

Corpus Linguistics and Linguistic Theory ↗
Biber (2019)

Multi-dimensional analysis: A historical synopsis

Multi-dimensional analysis: Research methods and current issues
Boyd, R.L., Ashokkumar, A., Seraj, S., & Pennebaker, J.W. (2022). The development and psychometric properties…
Bricken (2023)

Towards Monosemanticity: Decomposing Language Models with Dictionary Learning

AI Transformer Circuits Thread
Burstein (2023)

10.46999/VCAE5025

The Duolingo English Test Responsible AI Standards ↗
Burstein (2004)

Automated essay evaluation: The Criterion online writing service

AI Magazine
Burstein (1999)

Automated essay scoring for nonnative English speakers

In ASSESSEVALNLP '99: Proceedings of a Symposium on Computer Mediated Language Assessment and Evaluation in Natural Language Processing ↗
Chung (2018)

What do we know when we LIWC a person? Text analysis as an assessment tool for traits, pe…

The Sage Handbook of Personality and Individual Differences
Crossley et al. (2022)

The persuasive essays for rating, selecting, and understanding argumentative an…

Assessing Writing
Crossley (2023)

A large-scale corpus for assessing written argumentation: PERSUADE 2.0

Assessing Writing
Crossley (2008)

Assessing text readability using cognitively based indices

TESOL Quarterly ↗
Deane (2006)

Differences in text structure and its implications for assessment of struggling readers

Scientific Studies of Reading ↗
Deane (2024)

Modeling Writing Traits in a Formative Essay Corpus

ETS Research Report Series
Farra (2015)

June). Scoring persuasive essays using opinions and their targets

In Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications ↗
Fiacco (2023)

Functional Components as a Paradigm for Neural Model Explainability (Doctoral dissertation, Carnegie Mellon University)
Flor (2013)

June). Lexical tightness and text complexity

In Proceedings of the Workshop on Natural Language Processing for Improving Textual Accessibility
Grzegorczyk (2019)

Vector representations of text data in deep learning

arXiv Preprint arXiv
Guenole (2024)

Pseudo Factor Analysis of Language Embedding Similarity Matrices: New Ways to Model Latent Constructs Preprint
He (2020)

Deberta: Decoding-enhanced bert with disentangled Attention arXiv Preprint
Helberg et al. (2018)

Teaching textual awareness with DocuScope: Using corpus-driven tools and reflec…

Assessing Writing
Ishizaki (2012)

Computer-aided rhetorical analysis

Applied Natural Language Processing: Identification, Investigation and Resolution
Jeong (2022)

SciDeBERTa: Learning DeBERTa for Science Technology Documents and Fine-Tuning Information…

IEEE Access ↗
Jiang, A.Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D.S., de Las Casas, D., Bressand, F., Lengye…
Kahn (2007)

Measuring emotional expression with the Linguistic Inquiry and Word Count

The American Journal of Psychology ↗
Kandru (2023)

July). Tenzin-Gyatso at SemEval-2023 Task 4: Identifying Human Values behind Arguments Us…

In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023) ↗
Kaufer, D. (2023). The DocuScope project History, theory and future directions. In Kaufer, D. & Ishizaki, S.,…

↗
Kaufer (2016)

Corpus Methods and Textual Visualization to Enhance Learning in Core Writing Courses

In EDM (Workshops)
Klebanov, B.B., Stab, C., Burstein, J., Song, Y., Gyawali, B., & Gurevych, I. (2016, August). Argumentation: …

↗
Kruskal (1964)

Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis

Psychometrika ↗
Kruskal (1978)

Multidimensional scaling
Marion (2019)

BIR: A method for selecting the best interpretable multidimensional scaling rotation usin…

Neurocomputing ↗
Martin (2022)

July). KDDIE at SemEval-2022 Task 11: Using DeBERTa for Named Entity Recognition

In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) ↗
McCaffrey (2021)

Best practices for constructed-response scoring
Minh (2022)

Explainable artificial intelligence: a comprehensive review

Artificial Intelligence Review ↗
Pennebaker (2015)

Linguistic Inquiry and word count: LIWC 2015 Operator’States Manual
Phillips (2021)

Four principles of explainable artificial intelligence

NISTIR 8312) US Department of Commerce, National Institute of Standards; Technology
Rahali (2023)

End-to-end transformer-based models in textual-based NLP

AI ↗
Rudin (2019)

Stop explaining black box machine learning models for high stakes decisions and use inter…

Nature Machine Intelligence ↗
Sheehan (2017)

Validating automated measures of text complexity

Educational Measurement: Issues and Practice ↗
Singh (2024)

Rethinking Interpretability in the Era of Large Language Models

arXiv
Templeton (2024)

Scaling monosemanticity AI Transformer Circuits Thread
Wang (2019)

Language models with transformers

arXiv Preprint arXiv
Xia, F., Li, B., Weng, Y., He, S., Sun, B., Li, S., & Zhao, J. (2022, July). LingJing at SemEval-2022 Task 3:…

↗
Yngve, V.H. (1961). The depth hypothesis. In Jakobson, R. (Ed.), Proceedings of Symposia in Applied Mathemati…

↗
Young (2018)

Recent trends in deep learning based natural language processing [Review Article

IEEE Computational Intelligence Magazine ↗
Yun (2014)

On the iteration complexity of cyclic coordinate gradient descent methods

SIAM Journal on Optimization ↗

CrossRef global citation count: 0 View in citation network → Build reading path →

Extracting interpretable writing traits from a large language model

Abstract

Citation Context

Cited by in this index (0)

References (56) · 2 in this index

Related Articles