Extracting interpretable writing traits from a large language model

Paul Deane Educational Testing Service ; Andrew Hoang Educational Testing Service

Abstract

Large language models (LLMs) are increasingly used to support automated writing evaluation (AWE), both for purposes of scoring and feedback. However, LLMs present challenges to interpretability, making it hard to evaluate the construct validity of scoring and feedback models. BIOT (best interpretable orthogonal transformations) is a new method of analysis that makes dimensions of an embedding interpretable by aligning them with external predictors. It was originally developed to improve the interpretability of multidimensional scaling models. However, This paper shows that BIOT can be used to align LLM embeddings with an interpretable writing trait model developed using multidimensional analysis of classical NLP features to measure latent dimensions of writing style and writing quality. This makes it possible to determine whether an AWE model built using an LLM is aligned with known (and construct-relevant) dimensions of textual variation, supporting construct validity. Specifically, we examine the alignment between the hidden layers of deBERTA, a small LLM that has been shown to be useful for a variety of natural language processing applications, and a writing trait model developed through factor analysis of classical features used in existing AWE models. Specific dimensions of transformed deBERTA layers are strongly correlated with these classical factors. When the transformation matrix derived using BIOT is applied to token vectors, it is also possible to visualize which tokens in the original text contributed to high or low scores on a specific dimension. • Large language models (LLMs) are increasingly used to support automated writing evaluate (AWE). • LLMs present challenges to interpretability, making it hard to evaluate construct validity of scoring and feedback models. • BIOT is a new interpretation method that aligns embedding dimensions with external predictors. • Specifically, BIOT can be used to align LLM embeddings with classical NLP measures of aspects of style and writing quality. • This demonstrates a general method to determine whether an LLM latently represents construct-relevant dimensions.

Journal
Assessing Writing
Published
2026-01-01
DOI
10.1016/j.asw.2025.101011
CompPile
Search in CompPile ↗
Open Access
OA PDF Hybrid
Topics
Export

Citation Context

Cited by in this index (0)

No articles in this index cite this work.

References (56) · 2 in this index

  1. American Educational Research Association (AERA), American Psychological Association (APA), & National Counci…
  2. Automated essay scoring with e-rater® V. 2. The Journal of Technology
    Learning and Assessment
  3. A developmental writing scale
    ETS RR-08-19 ETS Research Report Series
  4. Automated Trait Scores for TOEFL® Writing Tasks
    ETS RR-15-15 ETS Research Report Series
  5. Automated trait scores for" GRE"® writing tasks
    Research Report ETS RR-15-15 ETS Research Report Series
Show all 56 →
  1. Bhardwaj, A., Di, W., & Wei, J. (2018). Deep Learning Essentials: Your hands-on guide to the fundamentals of …
  2. BIOT: explaining multidimensional nonlinear MDS embeddings using the best interpretable o…
    Neurocomputing  
  3. Variation across Speech and Writing
  4. A typology of English texts
    Linguistics  
  5. Biber, D. (2006). University language: a corpus-based study of spoken and written registers. Benjamins.
  6. Register as a predictor of linguistic variation
    Corpus Linguistics and Linguistic Theory  
  7. Multi-dimensional analysis: A historical synopsis
    Multi-dimensional analysis: Research methods and current issues
  8. Boyd, R.L., Ashokkumar, A., Seraj, S., & Pennebaker, J.W. (2022). The development and psychometric properties…
  9. Towards Monosemanticity: Decomposing Language Models with Dictionary Learning
    AI Transformer Circuits Thread
  10. 10.46999/VCAE5025
    The Duolingo English Test Responsible AI Standards  
  11. Automated essay evaluation: The Criterion online writing service
    AI Magazine
  12. Automated essay scoring for nonnative English speakers
    In ASSESSEVALNLP '99: Proceedings of a Symposium on Computer Mediated Language Assessment and Evaluation in Natural Language Processing  
  13. What do we know when we LIWC a person? Text analysis as an assessment tool for traits, pe…
    The Sage Handbook of Personality and Individual Differences
  14. Assessing Writing
  15. A large-scale corpus for assessing written argumentation: PERSUADE 2.0
    Assessing Writing
  16. Assessing text readability using cognitively based indices
    TESOL Quarterly  
  17. Differences in text structure and its implications for assessment of struggling readers
    Scientific Studies of Reading  
  18. Modeling Writing Traits in a Formative Essay Corpus
    ETS Research Report Series
  19. June). Scoring persuasive essays using opinions and their targets
    In Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications  
  20. Functional Components as a Paradigm for Neural Model Explainability (Doctoral dissertation, Carnegie Mellon University)
  21. June). Lexical tightness and text complexity
    In Proceedings of the Workshop on Natural Language Processing for Improving Textual Accessibility
  22. Vector representations of text data in deep learning
    arXiv Preprint arXiv
  23. Pseudo Factor Analysis of Language Embedding Similarity Matrices: New Ways to Model Latent Constructs Preprint
  24. Deberta: Decoding-enhanced bert with disentangled Attention arXiv Preprint
  25. Assessing Writing
  26. Computer-aided rhetorical analysis
    Applied Natural Language Processing: Identification, Investigation and Resolution
  27. SciDeBERTa: Learning DeBERTa for Science Technology Documents and Fine-Tuning Information…
    IEEE Access  
  28. Jiang, A.Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D.S., de Las Casas, D., Bressand, F., Lengye…
  29. Measuring emotional expression with the Linguistic Inquiry and Word Count
    The American Journal of Psychology  
  30. July). Tenzin-Gyatso at SemEval-2023 Task 4: Identifying Human Values behind Arguments Us…
    In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)  
  31. Kaufer, D. (2023). The DocuScope project History, theory and future directions. In Kaufer, D. & Ishizaki, S.,…
  32. Corpus Methods and Textual Visualization to Enhance Learning in Core Writing Courses
    In EDM (Workshops)
  33. Klebanov, B.B., Stab, C., Burstein, J., Song, Y., Gyawali, B., & Gurevych, I. (2016, August). Argumentation: …
  34. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis
    Psychometrika  
  35. Multidimensional scaling
  36. BIR: A method for selecting the best interpretable multidimensional scaling rotation usin…
    Neurocomputing  
  37. July). KDDIE at SemEval-2022 Task 11: Using DeBERTa for Named Entity Recognition
    In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)  
  38. Best practices for constructed-response scoring
  39. Explainable artificial intelligence: a comprehensive review
    Artificial Intelligence Review  
  40. Linguistic Inquiry and word count: LIWC 2015 Operator’States Manual
  41. Four principles of explainable artificial intelligence
    NISTIR 8312) US Department of Commerce, National Institute of Standards; Technology
  42. End-to-end transformer-based models in textual-based NLP
    AI  
  43. Stop explaining black box machine learning models for high stakes decisions and use inter…
    Nature Machine Intelligence  
  44. Validating automated measures of text complexity
    Educational Measurement: Issues and Practice  
  45. Rethinking Interpretability in the Era of Large Language Models
    arXiv
  46. Scaling monosemanticity AI Transformer Circuits Thread
  47. Language models with transformers
    arXiv Preprint arXiv
  48. Xia, F., Li, B., Weng, Y., He, S., Sun, B., Li, S., & Zhao, J. (2022, July). LingJing at SemEval-2022 Task 3:…
  49. Yngve, V.H. (1961). The depth hypothesis. In Jakobson, R. (Ed.), Proceedings of Symposia in Applied Mathemati…
  50. Recent trends in deep learning based natural language processing [Review Article
    IEEE Computational Intelligence Magazine  
  51. On the iteration complexity of cyclic coordinate gradient descent methods
    SIAM Journal on Optimization