Journal of Writing Analytics
5 articlesJanuary 2021
January 2020
January 2017
-
Abstract
Aim: This research note narrates existing and continuing potential crossover between the digital humanities and writing studies. I identify synergies between the two fields’ methodologies and categorize current research in terms of four permutations, or “valences,” of the phrase “writing analytics.” These valences include analytics of writing , writing of analytics , writing as analytics , and analytics as writing . I bring recent work in the two fields together under these common labels, with the goal of building strategic alliances between them rather than to delimit or be comprehensive. I offer the valences as one heuristic for establishing connections and distinctions between two fields engaged in complementary work without firm or definitive discursive borders. Writing analytics might provide a disciplinary ground that incorporates and coheres work from these different domains. I further hope to locate the areas in which my current research in digital humanities, grounded in archival studies, might most shape writing analytics. Problem Formation: Digital humanities and writing studies are two fields in which scholars are performing massive data analysis research projects, including those in which data are writing or metadata that accompanies writing. There is an emerging environment in the Modern Language Association friendly to crossover between the humanities and writing studies, especially in work that involves digital methods and media. Writing analytics accordingly hopes to find common disciplinary ground with digital humanities, with the goal of benefitting from and contributing to conversations about the ethical application of digital methods to its research questions. Recent work to bridge digital humanities and writing studies more broadly has unfortunately focused more on territorial and usability concerns than on identifying resonances between the fields’ methodological and ethical commitments. Information Collection: I draw from a history of meta-academic literature in digital humanities and writing studies to review their shared methodological commitments, particularly in literature that recognizes and responds to pushback against the fields’ ostensible use of extra-disciplinary methods. I then turn to current research in both fields that uses and critiques computational techniques, which is most relevant to writing analytics’ articulated focus on massive data analysis. I provide a more detailed explanation, drawing from my categorization of this work, of the conversations in digital humanities surrounding the digital archives that enable data analysis. Conclusions: A review of past and current research in digital humanities and writing studies reveals shared attention to techniques for tokenizing texts at different scales for analysis, which is made possible by the curation of large corpora. Both fields are writing new genres to compose this analysis. In these genres, both fields emphasize process in their provisional work, which is sociocognitively repurposed in different rhetorical contexts. Finally, both fields recognize that the analytical methods they employ are themselves modes of composition and argumentation. An ethics of data transformation present in digital humanities, however, is largely absent from writing studies. This ethics comes to digital humanities from the influence of textual studies and archival studies. Further research in writing analytics might benefit from reframing writing corpora as archives—what Paul Fyfe (2017) calls a shift from “data mining” to “data archaeology”—in its analyses. This is especially true for analyses of text, which in particular foreground writing and analysis of writing as acts of transformation. Directions for Further Research: I recommend that future efforts to find crossover between digital humanities and writing studies do so by identifying their common values rather than trying to co-opt language and spaces or engaging in broad definitional work. I further provide a set of guiding principles that writing analytics might follow in order to pursue research that draws upon and contributes to both digital humanities and writing studies. These research projects might consider and account for the silences of writing corpora—unseen versions of documents, and documents’ elements not described in structured data—while attending to the silences that these efforts might in turn (re)produce.
-
Abstract
Background: A shift of focus has been marked in recent years in the development of automated essay scoring systems (AES) passing from merely assigning a holistic score to an essay to providing constructive feedback over it. Despite all the major advances in the domain, many objections persist concerning their credibility and readiness to replace human scoring in high-stakes writing assessments. The purpose of this study is to shed light on how to build a relatively simple AES system based on five baseline writing features. The study shows that the proposed AES system compares very well with other state-of-the-art systems despite its obvious limitations. Literature Review: In 2012, ASAP (Automated Student Assessment Prize) launched a demonstration to benchmark the performance of state-of-the-art AES systems using eight hand-graded essay datasets originating from state writing assessments. These datasets are still used today to measure the accuracy of new AES systems. Recently, Zupanc and Bosnic (2017) developed and evaluated another state-of-the-art AES system, called SAGE, which enclosed new semantic and consistency features and provided for the first time an automatic semantic feedback. SAGE’s agreement level between machine and human scores for ASAP dataset #8 (the dataset also of interest in this study) was measured and had a quadratic weighted kappa of 0.81, while it ranged for 10 other state-of-the-art systems between 0.60 and 0.73 (Chen et al., 2012; Shermis, 2014). Finally, this section discusses the limitations of AES, which come mainly from its omission to assess higher-order thinking skills that all writing constructs are ultimately designed to assess. Research Questions: The research questions that guide this study are as follows: RQ1: What is the power of the writing analytics tool’s five-variable model (spelling accuracy, grammatical accuracy, semantic similarity, connectivity, lexical diversity) to predict the holistic scores of Grade 10 narrative essays (ASAP dataset #8)? RQ2: What is the agreement level between the computer rater based on the regression model obtained in RQ1 and the human raters who scored the 723 narrative essays written by Grade 10 students (ASAP dataset #8)? Methodology: ASAP dataset #8 was used to train the predictive model of the writing analytics tool introduced in this study. Each essay was graded by two teachers. In case of disagreement between the two raters, the scoring was resolved by a third rater. Basically, essay scores were the weighted sums of four rubric scores. A multiple linear regression analysis was conducted to determine the extent to which a five-variable model (selected from a set of 86 writing features) was effective to predict essay scores. Results: The regression model in this study accounted for 57% of the essay score variability. The correlation (Pearson), the percentage of perfect matches, the percentage of adjacent matches (±2), and the quadratic weighted kappa between the resolved scores and predicted essay scores were 0.76, 10%, 49%, and 0.73, respectively. The results were measured on an integer scale of resolved essay scores between 10-60. Discussion: When measuring the accuracy of an AES system, it is important to take into account several metrics to better understand how predicted essay scores are distributed along the distribution of human scores. Using average ranking over correlation, exact/adjacent agreement, quadratic weighted kappa, and distributional characteristics such as standard deviation and mean, this study’s regression model ranks 4th out of 10 AES systems. Despite its relatively good rank, the predictions of the proposed AES system remain imprecise and do not even look optimal to identify poor-quality essays (binary condition) smaller than or equal to a 65% threshold (71% precision and 92% recall). Conclusions: This study sheds light on the implementation process and the evaluation of a new simple AES system comparable to the state of the art and reveals that the generally obscure state-of-the-art AES system is most likely concerned only with shallow assessment of text production features. Consequently, the authors advocate greater transparency in the development and publication of AES systems. In addition, the relationship between the explanation of essay score variability and the inter-rater agreement level should be further investigated to better represent the changes in terms of level of agreement when a new variable is added to a regression model. This study should also be replicated at a larger scale in several different writing settings for more robust results.
-
Abstract
Aim: This research note focuses on some of the consequences of big data as an emerging methodology. Its purpose is to provide a brief literature review of the method’s development and some of the critical questions researchers should consider as they move forward. Salvo (2012) contends that big data as a form of design of communication itself “is necessarily a rhetorically-based field” (p. 38). With big data as an up and coming methodology (McNely, 2012; Salvo, 2012), using caution in its application is a necessity for scholars. Not only should researchers seek out the unseen and untapped applications of big data, but they should learn its limitations as well (Spinuzzi, 2009). You adopt a methodology, you adopt its flaws. Problem Formation: This section identifies a gap in the field as it relates to some of the consequences of applying big data as a methodology and seeing it as a rhetorical tool. As big data gains steam in the field of humanities, some are sure to question what they see as a flaw: the act of quantifying language. This argument is not new nor is its rebuttal. Harris (1954) discusses the distributional structure of language with each part of a sentence acting as co-occurents, each in a particular position, and each with a relationship to the other co-occurents (p. 146). Salvo (2012) argues that the combination of these new methodologies and technologies “knits together invention, arrangement, style, memory, and delivery in ways that challenge conceptions of print based literacy and textuality” (p. 39). While big data itself has several rhetorical methodologies embedded within, deciding which one to use depends on the amount of data and how it’s aggregated. • Information Collection: As described above, this research note functions primarily as a brief review of literature. This section focuses on how writing analytics developed from content analysis in mass communications and shifted into latent semantic analysis assisted by computer technology. Riffe, Lacy, & Fico (1995) offer a clear explanation of content analysis, which was developed with comparably small data sets in mind: “Usually, but not always, content analysis involves drawing representative samples of content, training coders to use the category rules developed to measure or reflect differences in content, and measuring reliability (agreement or stability over time) of coders applying the rules” (p. 2). Finding a representative sample of content was once a more feasible methodology, but in the digital age that amount of content exponentially increases every day. Conclusions: As latent semantic analysis is an extension of quantitative content analysis (and vice versa)—and knowing that an adopted methodology carries adopted flaws—it makes sense to turn to some of the concerns voiced by mass communication scholars in order to understand limitations. While quantitative content analysis grew in popularity in mass communication, so did the refining of its methods. Reporting the reliability of a study adds credibility to the study itself, and when a human coder is involved, the reporting of this intercoder reliability becomes imperative (Hayes & Krippendorf, 2007; Krippendorf, 2008, 2011). While intercoder reliability measures the degree to which coders agree, researchers should also be keenly aware of the theory and valence informing their study, which impacts their coders, which ultimately impacts the results of the study itself. Directions for Further Research: As the field of writing studies begins to adopt big data methodologies, researchers must continue to challenge and question their applications, implementations, and implications, turning to familiar questions from our own fields. Big data is exciting and new, but it’s not the methodology to explain it all. It’s just as rhetorical as every other methodology—it’s just better at hiding it.