Scott A. Crossley

16 articles
Georgia State University ORCID: 0000-0002-5148-0273

Loading profile…

Publication Timeline

Co-Author Network

Research Topics

Who Reads Crossley

Scott A. Crossley's work travels primarily in Composition & Writing Studies (77% of indexed citations) · 161 total indexed citations from 5 clusters.

By cluster

  • Composition & Writing Studies — 124
  • Rhetoric — 26
  • Other / unclustered — 5
  • Digital & Multimodal — 3
  • Technical Communication — 3

Counts include only citations from indexed journals that deposit reference lists with CrossRef. Authors whose readers publish primarily in venues without reference deposits will appear less central than they are. See coverage notes →

  1. A large-scale corpus for assessing source-based writing quality: ASAP 2.0
    Abstract

    This paper introduces ASAP 2.0, a dataset of ∼25,000 source-based argumentative essays from U.S. secondary students. The corpus addresses the shortcomings of the original ASAP corpus by including demographic data, consistent scoring rubrics, and source texts. ASAP 2.0 aims to support the development of unbiased, sophisticated Automatic Essay Scoring (AES) systems that can foster improved educational practices by providing summative to students. The corpus is designed for broad accessibility with the hope of facilitating research into writing quality and AES system biases. • We introduce the ASAP 2.0 corpus. • The corpus contains over 25,000 source-based essays. • Each essay is scored for overall writing quality. • The corpus can be used to computationally and quantitatively model source-based writing quality.

    doi:10.1016/j.asw.2025.100954
  2. The persuasive essays for rating, selecting, and understanding argumentative and discourse elements (PERSUADE) corpus 1.0
    Abstract

    This paper introduces the Persuasive Essays for Rating, Selecting, and Understanding Argumentative and Discourse Elements (PERSUADE) corpus.The PERSUADE corpus is large-scale corpus of writing with annotated discourse elements. The goal of the corpus is to spur the development of new, open-source scoring algorithms that identify discourse elements in argumentative writing to open new avenues for the development of automatic writing evaluation systems that focus more specifically on the semantic and organizational elements of student writing.

    doi:10.1016/j.asw.2022.100667
  3. Linguistic Features of Writing Quality and Development: A Longitudinal Approach
    doi:10.37514/jwa-j.2022.6.1.04
  4. Corrigendum to “Modeling second language writing quality: A structural equation investigation of lexical, syntactic, and cohesive features in source-based and independent writing” [Assess. Writ. 37C (2018) 39–56]
    doi:10.1016/j.asw.2018.09.002
  5. Assessing writing with the tool for the automatic analysis of lexical sophistication (TAALES)
    doi:10.1016/j.asw.2018.06.004
  6. Modeling second language writing quality: A structural equation investigation of lexical, syntactic, and cohesive features in source-based and independent writing
    doi:10.1016/j.asw.2018.03.002
  7. Applying Natural Language Processing Tools to a Student Academic Writing Corpus: How Large are Disciplinary Differences Across Science and Engineering Fields?
    Abstract

    • Background: Researchers have been working towards better understanding differences in professional disciplinary writing (e.g., Ewer & Latorre, 1969; Hu & Cao, 2015; Hyland, 2002; Hyland & Tse, 2007) for decades. Recently, research has taken important steps towards understanding disciplinary variation in student writing. Much of this research is corpus-based and focuses on lexico-grammatical features in student writing as captured in the British Academic Written English (BAWE) corpus and the Michigan Corpus of Upper-level Student Papers (MICUSP). The present study extends this work by analyzing lexical and cohesion differences among disciplines in MICUSP. Critically, we analyze not only linguistic differences in macro-disciplines (science and engineering), but also in micro-disciplines within these macro-disciplines (biology, physics, industrial engineering, and mechanical engineering).\n• Literature Review: Hardy and Römer (2013) used a multidimensional analysis to investigate linguistic differences across four macro-disciplines represented in MICUSP. Durrant (2014, in press) analyzed vocabulary in texts produced by student writers in the BAWE corpus by discipline and level (year) and disciplinary differences in lexical bundles. Ward (2007) examined lexical differences within micro-disciplines of a single discipline.\n• Research Questions: The research questions that guide this study are as follows:\n1. Are there significant lexical and cohesive differences between science and engineering student writing? 2. Are there significant lexical and cohesive differences between micro-disciplines within science and engineering student writing?\n• Research Methodology: To address the research questions, student-produced science and engineering texts from MICUSP were analyzed with regard to lexical sophistication and textual features of cohesion. Specifically, 22 indices of lexical sophistication calculated by the Tool for the Automatic Analysis of Lexical Sophistication (TAALES; Kyle & Crossley, 2015) and 38 cohesion indices calculated by the Tool for the Automatic Analysis of Cohesion (TAACO; Crossley, Kyle, & McNamara, 2016) were used. These features were then compared both across science and engineering texts (addressing Research Question 1) and across micro-disciplines within science and engineering (biology and physics, industrial and mechanical engineering) using discriminate function analyses (DFA).\n• Results: The DFAs revealed significant linguistic differences, not only between student writing in the two macro-disciplines but also between the micro-disciplines. Differences in classification accuracy based on students’ years of study hovered at about 10%. An analysis of accuracies of classification by paper type found they were similar for larger and smaller sample sizes, providing some indication that paper type was not a confounding variable in classification accuracy.\n• Discussion: The findings provide strong support that macro-disciplinary and micro-disciplinary differences exist in student writing in these MICUSP samples and that these differences are likely not related to student level or paper type. These findings have important implications for understanding disciplinary differences. First, they confirm previous research that found the vocabulary used by different macro-disciplines to be “strikingly diverse” (Durrant, 2015), but they also show a remarkable diversity of cohesion features. The findings suggest that the common understanding of the STEM disciplines as “close” bears reconsideration in linguistic terms. Second, the lexical and cohesion differences between micro-disciplines are large enough and consistent enough to suggest that each micro-discipline can be thought of as containing a unique linguistic profile of features. Third, the differences discerned in the NLP analysis are evident at least as early as the final year of undergraduate study, suggesting that students at this level already have a solid understanding of the conventions of the disciplines of which they are aspiring to be members. Moreover, the differences are relatively homogeneous across levels, which confirms findings by Durrant (2015) but, importantly, extends these findings to include cohesion markers.\n• Conclusions: The findings from this study provide evidence that macro-disciplinary and micro-disciplinary differences at the linguistic level exist in student writing, not only in lexical use but also in text cohesion. A number of pedagogical applications of writing analytics are proposed based on the reported findings from TAALES and TAACO. Further studies using different corpora (e.g., BAWE) or purpose assembled corpora are suggested to address limitations in the size and range of text types found within MICUSP. This study also points the way toward studies of disciplinary differences using NLP approaches that capture data which goes beyond the lexical and cohesive features of text, including the use of part-of-speech tags, syntactic parsing, indices related to syntactic complexity and similarity, rhetorical features, or more advanced cohesion metrics (latent semantic analysis, latent Dirichlet allocation, Word2Vec approaches).

    doi:10.37514/jwa-j.2017.1.1.04
  8. Idea Generation in Student Writing: Computational Assessments and Links to Successful Writing
    Abstract

    Idea generation is an important component of most major theories of writing. However, few studies have linked idea generation in writing samples to assessments of writing quality or examined links between linguistic features in a text and idea generation. This study uses human ratings of idea generation, such as idea fluency, idea flexibility, idea originality, and idea elaboration, to analyze the extent to which idea generation relates to human judgments of essay quality in a corpus of college student essays. In conjunction with this analysis, linguistic features extracted from the essays are used to develop a predictive model of idea generation to further understand relations between the language features in an essay and the idea generation scores assigned to that essay. The results indicate that essays rated as containing a greater number of ideas that were flexible, original, and elaborated were judged to be of higher quality. Two of these features (elaboration and originality) were significant predictors of essay quality scores in a regression analysis that explained 33% of the variance in human scores. The results also indicate that idea generation is strongly linked to language features in essays. Specifically, the use of unique multiword units, more difficult words, semantic but not lexical similarities between paragraphs, and fewer word repetitions explained 80% of the variance in human scores of idea generation. These results have implications for writing theories and writing practice.

    doi:10.1177/0741088316650178
  9. Say more and be more coherent: How text elaboration and cohesion can increase writing quality
    doi:10.17239/jowr-2016.07.3.02
  10. Say more and be more coherent: How text elaboration and cohesion can increase writing quality
    Abstract

    This study examines links between essay quality and text elaboration and text cohesion. For this study, 35 students wrote two essays (on two different prompts) and for each, were given 15 minutes to elaborate on their original text. An expert in discourse comprehension then modified the original and elaborated essays to increase cohesion, resulting in a 2 (prompt) x 2 (original content, elaborated content) x 2 (original cohesion, improved cohesion) design. Expert raters scored the essays for overall quality and text coherence. In terms of overall essay quality, increasing text content (i.e., elaboration) and improving cohesion both led to significant gains in expert judgments of writing quality, and a combination of both elaboration and improved cohesion led to increased scores over increased cohesion alone. Judgments of text coherence were increased by improved cohesion (but not elaboration); and a combination of both elaboration and improved cohesion led to higher human ratings of coherence in comparison to the original and elaborated versions. The results have important implications for writing theories, writing success, writing pedagogy, and standardized testing.

    doi:10.17239/jowr-2016.07.03.02
  11. A hierarchical classification approach to automated essay scoring
    doi:10.1016/j.asw.2014.09.002
  12. The Writing Pal Intelligent Tutoring System: Usability Testing and Development
    doi:10.1016/j.compcom.2014.09.002
  13. What Is Successful Writing? An Investigation Into the Multiple Ways Writers Can Write Successful Essays
    Abstract

    This study identifies multiple profiles of successful essays via a cluster analysis approach using linguistic features reported by a variety of natural language processing tools. The findings from the study indicate that there are four profiles of successful writers for the samples analyzed. These four profiles are linguistically distinct from one another and demonstrate that expert human raters examine a number of different linguistic features in a variety of combinations when assessing writing proficiency and assigning high scores to independent essays (regardless of the scoring rubric considered). The writing styles in the four clusters can be described as action and depiction style, academic style, accessible style, and lexical style. The study provides empirical evidence that successful writing cannot be defined simply through a single set of predefined features, but that, rather, successful writing has multiple profiles. While these profiles may overlap, each profile is distinct.

    doi:10.1177/0741088314526354
  14. Predicting human judgments of essay quality in both integrated and independent second language writing samples: A comparison study
    doi:10.1016/j.asw.2013.05.002
  15. The Development of Writing Proficiency as a Function of Grade Level: A Linguistic Analysis
    Abstract

    In this study, a corpus of essays stratified by level (9th grade, 11th grade, and college freshman) are analyzed computationally to discriminate differences between the linguistic features produced in essays by adolescents and young adults. The automated tool Coh-Metrix is used to examine to what degree essays written at various grade levels can be distinguished from one another using a number of linguistic features related to lexical sophistication (i.e., word frequency, word concreteness), syntactic complexity (i.e., the number of modifiers per noun phrase), and cohesion (i.e., word overlap, incidence of connectives). The analysis demonstrates that high school and college writers develop linguistic strategies as a function of grade level. Primarily, these writers produce more sophisticated words and more complex sentence structure as grade level increases. In contrast, these writers produce fewer cohesive features in text as a function of grade level. This analysis supports the notion that linguistic development occurs in the later stages of writing development and that this development is primarily related to producing texts that are less cohesive and more elaborate.

    doi:10.1177/0741088311410188
  16. Linguistic Features of Writing Quality
    Abstract

    In this study, a corpus of expert-graded essays, based on a standardized scoring rubric, is computationally evaluated so as to distinguish the differences between those essays that were rated as high and those rated as low. The automated tool, Coh-Metrix, is used to examine the degree to which high- and low-proficiency essays can be predicted by linguistic indices of cohesion (i.e., coreference and connectives), syntactic complexity (e.g., number of words before the main verb, sentence structure overlap), the diversity of words used by the writer, and characteristics of words (e.g., frequency, concreteness, imagability). The three most predictive indices of essay quality in this study were syntactic complexity (as measured by number of words before the main verb), lexical diversity (as measured by the Measure of Textual Lexical Diversity), and word frequency (as measured by Celex, logarithm for all words). Using 26 validated indices of cohesion from Coh-Metrix, none showed differences between high- and low-proficiency essays and no indices of cohesion correlated with essay ratings. These results indicate that the textual features that characterize good student writing are not aligned with those features that facilitate reading comprehension. Rather, essays judged to be of higher quality were more likely to contain linguistic features associated with text difficulty and sophisticated language.

    doi:10.1177/0741088309351547