Scott A. Crossley
5 articles-
Abstract
Idea generation is an important component of most major theories of writing. However, few studies have linked idea generation in writing samples to assessments of writing quality or examined links between linguistic features in a text and idea generation. This study uses human ratings of idea generation, such as idea fluency, idea flexibility, idea originality, and idea elaboration, to analyze the extent to which idea generation relates to human judgments of essay quality in a corpus of college student essays. In conjunction with this analysis, linguistic features extracted from the essays are used to develop a predictive model of idea generation to further understand relations between the language features in an essay and the idea generation scores assigned to that essay. The results indicate that essays rated as containing a greater number of ideas that were flexible, original, and elaborated were judged to be of higher quality. Two of these features (elaboration and originality) were significant predictors of essay quality scores in a regression analysis that explained 33% of the variance in human scores. The results also indicate that idea generation is strongly linked to language features in essays. Specifically, the use of unique multiword units, more difficult words, semantic but not lexical similarities between paragraphs, and fewer word repetitions explained 80% of the variance in human scores of idea generation. These results have implications for writing theories and writing practice.
-
What Is Successful Writing? An Investigation Into the Multiple Ways Writers Can Write Successful Essays ↗
Abstract
This study identifies multiple profiles of successful essays via a cluster analysis approach using linguistic features reported by a variety of natural language processing tools. The findings from the study indicate that there are four profiles of successful writers for the samples analyzed. These four profiles are linguistically distinct from one another and demonstrate that expert human raters examine a number of different linguistic features in a variety of combinations when assessing writing proficiency and assigning high scores to independent essays (regardless of the scoring rubric considered). The writing styles in the four clusters can be described as action and depiction style, academic style, accessible style, and lexical style. The study provides empirical evidence that successful writing cannot be defined simply through a single set of predefined features, but that, rather, successful writing has multiple profiles. While these profiles may overlap, each profile is distinct.
-
Abstract
In this study, a corpus of essays stratified by level (9th grade, 11th grade, and college freshman) are analyzed computationally to discriminate differences between the linguistic features produced in essays by adolescents and young adults. The automated tool Coh-Metrix is used to examine to what degree essays written at various grade levels can be distinguished from one another using a number of linguistic features related to lexical sophistication (i.e., word frequency, word concreteness), syntactic complexity (i.e., the number of modifiers per noun phrase), and cohesion (i.e., word overlap, incidence of connectives). The analysis demonstrates that high school and college writers develop linguistic strategies as a function of grade level. Primarily, these writers produce more sophisticated words and more complex sentence structure as grade level increases. In contrast, these writers produce fewer cohesive features in text as a function of grade level. This analysis supports the notion that linguistic development occurs in the later stages of writing development and that this development is primarily related to producing texts that are less cohesive and more elaborate.
-
Abstract
In this study, a corpus of expert-graded essays, based on a standardized scoring rubric, is computationally evaluated so as to distinguish the differences between those essays that were rated as high and those rated as low. The automated tool, Coh-Metrix, is used to examine the degree to which high- and low-proficiency essays can be predicted by linguistic indices of cohesion (i.e., coreference and connectives), syntactic complexity (e.g., number of words before the main verb, sentence structure overlap), the diversity of words used by the writer, and characteristics of words (e.g., frequency, concreteness, imagability). The three most predictive indices of essay quality in this study were syntactic complexity (as measured by number of words before the main verb), lexical diversity (as measured by the Measure of Textual Lexical Diversity), and word frequency (as measured by Celex, logarithm for all words). Using 26 validated indices of cohesion from Coh-Metrix, none showed differences between high- and low-proficiency essays and no indices of cohesion correlated with essay ratings. These results indicate that the textual features that characterize good student writing are not aligned with those features that facilitate reading comprehension. Rather, essays judged to be of higher quality were more likely to contain linguistic features associated with text difficulty and sophisticated language.