Scott Crossley

8 articles

Loading profile…

Publication Timeline

Co-Author Network

Research Topics

Who Reads Crossley

Scott Crossley's work travels primarily in Composition & Writing Studies (81% of indexed citations) · 27 total indexed citations from 3 clusters.

By cluster

  • Composition & Writing Studies — 22
  • Rhetoric — 3
  • Other / unclustered — 2

Counts include only citations from indexed journals that deposit reference lists with CrossRef. Authors whose readers publish primarily in venues without reference deposits will appear less central than they are. See coverage notes →

  1. Assessing fairness in finetuned scoring models with demographically restricted training data
    Abstract

    The increasing adoption of automated essay scoring (AES) in high-stakes educational contexts necessitates careful examination of potential biases within the systems. This study investigates how the demographic composition of training data influences fairness in AES systems developed from finetuned large language models (LLMs). Using the PERSUADE corpus of 26,000 student essays, we conducted a systematic analysis using demographically restricted training sets to isolate the impact of training data demographics on LLM-AES performance. Each demographically restricted training set comprised essays written by one racial/ethnic group. Four variants of a Longformer-based AES were developed: one trained on demographically balanced data and three trained on demographically restricted datasets. An initial analysis of the human ratings indicated that demographic factors significantly predict human essay scores (marginal R² = 0.125), a pattern that is paralleled in national writing assessment data. LLM-AES systems trained on demographically restricted data exhibited small systematic biases (marginal R² = 0.043). However, the LLM trained on balanced data showed minimal demographic bias, suggesting that representative training data can effectively prevent amplification of demographic disparities beyond those present in human ratings. These results highlight both the importance and limitations of training data diversity in achieving fair assessment outcomes. • 12.5% of variance in human essay ratings was explained by demographics. • We construct demographically restricted training sets to isolate bias. • Balanced training data minimized LLM-AES bias across demographic groups. • LLM-AES trained on demographically restricted data showed more bias.

    doi:10.1016/j.asw.2026.101032
  2. Distinguishing effective writing styles in the PERSUADE corpus
    Abstract

    Many linguistic studies of writing assume a single linear relationship between linguistic features in the text and human judgments of writing quality. However, writing quality may be better understood as a complex latent construct that can be constructed in a number of different ways through different linguistic profiles of high-quality writing styles as shown in Crossley et al. (2014). This study builds on the exploratory study reported by Crossley et al. by analyzing a representational corpus of 4,170 highly rated persuasive essays written by secondary-school students. The study uses natural language processing tools to derive quantitative representations for the linguistic features found in the texts. These linguistic features inform a k-means cluster analysis which indicates that a four-cluster profile best fits the data. By examining the indices most and least distinctive of each cluster, the study identifies a structured writing style, a conversational writing style, a reportive writing style, and an academic writing style. The findings support the notion that writers can employ a variety of writing profiles to successfully write an argumentative essay.

    doi:10.17239/jowr-2025.17.02.02
  3. The KLiCKe corpus: Keystroke logging in compositions for knowledge evaluation
    Abstract

    Despite the growing interest in the dynamics of the writing process in writing research, publicly available large-scale corpora of keystroke logs have been rare. We introduce KLiCKe, a freely available collection of keystroke logs for around 5,000 argumentative texts written by adults in the United States. The KLiCKe corpus also includes human-rated holistic scores for the essays as well as writers' demographic details, their typing skills, and vocabulary knowledge. We describe our methods for constructing the corpus and present descriptives for different components of the corpus. To illustrate the use of the KLiCKe corpus, we report a study using a subset of the corpus to investigate whether keystroke features are associated with holistic writing quality for L1 and L2 writers. The study shows that higher writing scores are related to shorter pauses in general, shorter between-word pauses, lower proportion of deletions, higher proportion of insertions, and less process variance. The KLiCKe corpus provides a robust resource for researchers to study the dynamics of text production and revision that will help spur the development of process-oriented tools and methodologies in writing assessment and instruction.

    doi:10.17239/jowr-2025.17.01.02
  4. Making sense of L2 written argumentation with keystroke logging
    Abstract

    This study examines associations between writing behaviors manifested by keystroke analytics and the formulation of argument elements in L2 undergraduate writers' writing processes. Ninety-nine persuasive essays written by L2 undergraduate writers were human annotated for Toulmin argument elements. The corresponding keystroke logs were segmented and analyzed to characterize the dynamics of writing processes for different categories of the elements. A multinomial mixed-effects logistic regression model was built to predict argument categories using the keystroke analytics. The study reported that L2 undergraduate writers' text production for final claims and primary claims featured P-bursts (execution processes delimited by pauses exceeding 2 seconds) of longer spans but lower production fluency compared to that for data. In addition, fewer revisions were observed when L2 writers were constructing final claims than when they were formulating data. These findings shed light on the varying cognitive loads and activities L2 undergraduate writers may experience when building different argument elements in written argumentation.

    doi:10.17239/jowr-2024.15.03.01
  5. Argumentation features and essay quality: Exploring relationships and incidence counts
    Abstract

    This study examines links between human ratings of writing quality and the incidence of argumentative features (e.g., claims, data) in persuasive essays along with relationships among these features and their distance from one another within an essay. The goal is to better understand how argumentation elements in persuasive essays combine to model human ratings of essay quality. The study finds that, in most cases, it is not the presence of argumentation features that is predictive of writing quality but rather the relationships between superordinate and subordinate features, parallel features, and the distances between features. This finding has not only theoretical value but also practical value in terms of pedagogical approaches and automated writing feedback.

    doi:10.17239/jowr-2022.14.01.01
  6. Linguistic features in writing quality and development: An overview
    Abstract

    This paper provides an overview of how analyses of linguistic features in writing samples provide a greater understanding of predictions of both text quality and writer development and links between language features within texts. Specifically, this paper provides an overview of how language features found in text can predict human judgements of writing proficiency and changes in writing levels in both cross-sectional and longitudinal studies. The goal is to provide a better understanding of how language features in text produced by writers may influence writing quality and growth. The overview will focus on three main linguistic construct (lexical sophistication, syntactic complexity, and text cohesion) and their interactions with quality and growth in general. The paper will also problematize previous research in terms of context, individual differences, and reproducibility

    doi:10.17239/jowr-2020.11.03.01
  7. Using human judgments to examine the validity of automated grammar, syntax, and mechanical errors in writing
    Abstract

    This study introduces GAMET, which was developed to help writing researchers examine the types and percentages of structural and mechanical errors in texts. GAMET is a desktop application that expands LanguageTool v3.2 through a user-friendly, graphic user interface that affords the automatic assessment of writing samples for structural and mechanical errors. GAMET is freely available, works on a variety of operating systems, affords document batch processing, and groups errors into a number of structural and mechanical error categories. This study also tests LanguageTool’s validity using hand-coded assessment for accuracy and meaningfulness on first language (L1) and second language (L2) writing corpora. The study also examines how well LanguageTool replicates human coding of structural and mechanical errors in an L1 corpus as well as assesses associations between GAMET and human ratings of essay quality. Results indicate that LanguageTool can be used to successful locate errors within text. However, while the accuracy of LanguageTool is high, the recall of errors is low, especially in terms of punctuation errors. Nevertheless, the errors coded by LanguageTool show significant correlations with human ratings of writing and grammar and mechanics errors. Overall, the results indicate that while LanguageTool fails to flag a number of errors, the errors flagged provide an accurate profile of the structural and mechanical errors made by writers.

    doi:10.17239/jowr-2019.11.02.01
  8. Classifying paragraph types using linguistic features: Is paragraph positioning important?
    Abstract

    This study examines the potential for computational tools and human raters to classify paragraphs based on positioning. In this study, a corpus of 182 paragraphs was collected from student, argumentative essays. The paragraphs selected were initial, middle, and final paragraphs and their positioning related to introductory, body, and concluding paragraphs. The paragraphs were analyzed by the computational tool Coh-Metrix on a variety of linguistic features with correlates to textual cohesion and lexical sophistication and then modeled using statistical techniques. The paragraphs were also classified by human raters based on paragraph positioning. The performance of the reported model was well above chance and reported an accuracy of classification that was similar to human judgments of paragraph type (66% accuracy for human versus 65% accuracy for our model). The model’s accuracy increased when longer paragraphs that provided more linguistic coverage and paragraphs judged by human raters to be of higher quality were examined. The findings support the notions that paragraph types contain specific linguistic features that allow them to be distinguished from one another. The finding reported in this study should prove beneficial in classroom writing instruction and in automated writing assessment.

    doi:10.17239/jowr-2011.03.02.3