Journal of Writing Analytics
8 articlesJanuary 2026
January 2025
January 2019
January 2018
-
Abstract
Background: Research incorporating large data sets and data and text mining methodologies is making initial contributions to writing studies. In writing program administration (WPA) work, one could best characterize the body of publications as small but growing, led by such work as Moxley and Eubanks’ 2015 “On Keeping Score: Instructors' vs. Students' Rubric Ratings of 46,689 Essays” and Arizona State University’s Science of Learning & Educational Technology (SoLET) Lab. Given the information that large-scale textual analysis can provide, it seems incumbent on program administrators to explore ways to make regular and aggressive use of such opportunities to give both students and instructors more resources for learning and development. This project is one attempt to add to this corpus of work; the sample for the study consisted of 17,534 pieces of student writing representing 141,659 discrete comments on that writing, with 58,300 unique words out of over 8.25 million total words written. This data is used to examine trends in the program’s instructor commentary over five years’ time. By doing so, this study revisits a fundamental task of writing instruction—responding to student writing, and from the data’s results considers how large writing programs with constant turnover of graduate teaching assistants (GTAs) might manage their ongoing instructor professional development and how those GTAs will improve their ability to teach and respond to writing.Literature Review: Researchers have attempted to unpack and understand the task of instructor commentary for several decades; the published literature demonstrates a complex and occasionally ambivalent relationship with this central task of writing instruction. Recent scholarship has moved from the small-scale studies long used by the field to implement large-scale examinations of the instruction occurring in writing programs. Research questions: Three questions guided the inquiry:Does the work of new instructors (MA1s) more closely resemble the lexicon of novice or experienced responders to student writing?How does the new instructors’ work compare to that of more experienced (PHD1 or INS) instructors in the program throughout their time?How does their work evolve over a four-semester longitudinal time frame (as MA1 or MA2 experience levels) in the first-year writing program? [Please note that the abbreviations used above and throughout the article to designate instructor experience levels are as follows: MA1 (first-year master’s students); MA2 (second-year master’s students); PHD1 (first-year doctoral students); INS (instructors—those with 3 or more years’ experience teaching and who are not currently pursuing an additional degree—nearly all of these individuals held a Master’s degree)].Methodology: This study extends the work of Anson and Anson (2017) who first surveyed writing instructors and program administrators to create wordlists that survey respondents associated with “high-quality” and “novice” responses, and then examined a corpus of nearly 50,000 peer responses produced at a single university to learn to what extent instructors and student peers adopted this lexicon. Specifically, the study analyzes a corpus of instructor comments to students using the Anson and Anson wordlists associated with principled and novice commentary to see if new writing instructors align more closely with the concepts represented in either list during their first semester in the program. It then tracks four cohorts for evolution and change in their vocabulary of feedback over their next three semesters in the program; the study also compares the vocabulary used in their comments to that used by experienced instructors in the program over the same time.Results: The study found that from the outset, the new instructors (MA1) incorporated more of the principled response terms than the novice response terms. Overall, in comparing the MA1 instructors with the most experienced group (INS), the results reveal three important findings about the feedback of both MA1s and INSs in this program.While there are some differences in commentary as seen via examination of the two lexicons, the differences are perhaps less than one might assume.The cohorts do increase their use of the principled terms as they move through the two years’ appointment in the program, but few of the increases demonstrate statistical significance.Few of the terms from either the novice or principled lexicon, with the exception of terms that also appear in the assignment descriptions, what I label as “content terms,” appear frequently in the overall corpus.Discussion: Based on the results, the instructors in this program had acquired a more consistent vocabulary, but not primarily one based on Anson and Anson’s two lexicons—instead, the most frequent and commonly used terms seem to come from a more local “canon,” that is, one based on the assignment descriptions and course outcomes. Regardless of whether the acquisition of a common vocabulary came from more global concepts or an assignment-based local canon, using common terms is something that Nancy Sommers (1982) saw as contributing to “thoughtful commentary” on student writing. As no one has previously studied how quickly new instructors acquire a professional vocabulary for responding to student writing, it is hard to know whether or not the results of this particular group of instructors would be considered “typical.” However, it may well be that the context of this writing program contributed to a more accelerated acquisition.Conclusions: Working with the lexicons developed via Anson and Anson’s survey is a useful starting point for understanding more of what our instructors actually do when responding to student writing, as well as for identifying critical differences in our instructors’ comments. The lexicons, though, only provide us with a subset of expected (thus acceptable) terms included in commentary—terms that afford students the opportunity to act upon receiving them via revision or transfer. Directions for Future Research: Additional research is necessary to expand and refine the lexicons and their impact on student writing. One possibility is to return to the current data set to engage in additional lexical analysis of both the novice and principled lexicons as well as the overall frequency tables to understand how terms are used in the context of response by the various instructor groups. Differences in the application of the terms might help us understand why comments might be labeled as more or less helpful to writers. Another strategy is to examine the data in terms of markers of stance; finally, topic modeling could be used to locate more subtle differences in the instructor comments that are not as easily identifiable with lexical analysis. Such examinations could serve as a baseline for broadening the study out to other sets of assignments and commentary, perhaps helping us build a set of threshold concepts for talking about writing with our students. Ultimately, it is important to replicate and expand Anson and Anson’s survey to other stakeholder groups. As with much research on the teaching of writing, we default to the group most accessible to us—other writing professionals. Replicating this survey with other stakeholders—graduate teaching assistants, undergraduate students at both lower and upper division levels— could help us understand whether or not a gap exists in understanding what constitutes good feedback from the various stakeholders.
-
Abstract
Background: Current research in composition and writing studies is concerned with issues of writing program evaluation and how writing tasks and their sequences scaffold students toward learning outcomes. These issues are beginning to be addressed by writing analytics research, which can be useful for identifying recurring types of language in writing assignments and how those can inform task design and student outcomes. To address these issues, this study provides a three-step method of sequencing, comparison, and diagnosis to understand how specific writing tasks fit into a classroom sequence as well as compare to larger genres of writing outside of the immediate writing classroom environment. By doing so, we provide writing program administrators with tools for describing what skills students demonstrate in a sequence of writing tasks and diagnosing how these skills match with writing students will do in later contexts. Literature Review: Student writing that responds to classroom assignments can be understood as genres, insofar as they are constructed responses that exist in similar rhetorical situations and perform similar social actions. Previous work in corpus analysis has looked at these genres, which helps us as writing instructors understand what kind of constructed responses are required of students and to make those expectations explicit. Aull (2017) examined a corpus of first-year undergraduate writing assignments in two courses to create “sociocognitive profiles” of these assignments. We analyze student writing that responds to similar writing tasks, but use a different corpus method that allows us to understand the tasks in both local and global contexts. By doing so, we gain confidence and depth in our understanding of these tasks, analyze how they sequence together, and are able to compare argumentative writing across institutions and contexts. Research Questions: Two questions guided our study: What is the trajectory of skills targeted by the sequence of tasks in the two first-year writing courses, as evidenced by the rhetorical strategies employed by the writers in successive assignments? Focusing on the final argument assignments, how similar are they to argumentative writing in other contexts, in terms of rhetorical profiles? Methodology: We first conducted a local analysis, in which we used a dictionary-based corpus method to analyze the rhetorical strategies used by writers in the first-year writing courses to understand how they built on each other to form a sequence. Having understood what skills students are demonstrating in a course, we then conducted a global analysis which calculated a “distance” between the first-year argument writing and a corpus of argument writing drawn from other contexts. Recognizing that there was a non-trivial distance, we then identified and evaluated the sources of the distance so that the writing tasks could be assessed or modified. Results: The local analysis revealed eight key rhetorical strategies that student writing exhibits between the two first-year writing courses. With this understanding, we then placed the argument writing in global contexts to find that the assignments in both courses differ somewhat from argument writing in other contexts. Upon analyzing this difference, we found that the first-year writing primarily differs in its usage of academic language, the personal register, assertive language, and reasoning. We suggest that these differences stem primarily from the rhetorical situation and learning objectives associated with first-year writing, as well as the sequencing of the courses. Discussion: The three-step method presented provides a means for writing program administrators to describe and analyze writing that students produce in their writing programs. We intend these steps to be understood as an iterative process, whereby writing programs can use these results to evaluate what rhetorical skills their students are exhibiting and to benchmark those against the program’s goals and/or other similar writing programs. Conclusions: By presenting these analyses together, we ultimately provide a cohesive method by which to analyze a writing program and benchmark students’ use of rhetorical strategies in relation to other argumentative contexts. We believe this method to be useful not only to individual writing programs, but to assessment literature broadly. In future research, we anticipate learning how this process will practically feed back into pedagogy, as well as understanding what placing writing tasks into a global context can tell us about genre theory.
-
Abstract
Background: Over a decade ago, the Stanford Study of Writing (SSW) collected more than 15,000 writing samples from undergraduate students, but to this point the corpus has not been analyzed using computational methods. Through the use of natural language processing (NLP) techniques, this study attempts to reveal underlying structures in the SSW, while at the same time developing a set of interpretable features for computationally understanding student writing. These features fall into three categories: topic-based features that reveal what students are writing about; stance-based features that reveal how students are framing their arguments; and structure-based features that reveal sentence complexity. Using these features, we are able to characterize the development of the SSW participants across four years of undergraduate study, specifically gaining insight into the different trajectories of humanities, social science, and STEM students. While the results are specific to Stanford University’s undergraduate program, they demonstrate that these three categories of features can give insight into how groups of students develop as writers.Literature Review: The Stanford Study of Writing (Lunsford et al., 2008; SSW, 2018) involved the collection of more than 15,000 writing samples from 189 students in the Stanford class of 2005. The literature surrounding the original study is largely qualitative (Fishman, Lunsford, McGregor, & Otuteye, 2005; Lunsford, 2013; Lunsford, Fishman, & Liew, 2013), so this study makes a first attempt at a quantitative analysis of the SSW. When considering the ethics of a computational approach, we find it important not to stray into the territory of writing evaluation, as purely evaluative systems have been shown to have limited instructional use in the classroom (Chen & Cheng, 2008; Weaver, 2006). Therefore, we find it important to take a descriptive, rather than evaluative approach. All of the features that we extract are both interpretable and grounded in prior research. Topic modeling has been used on undergraduate writing to improve the prediction of neuroticism and depression in college students (Resnik, Garron, & Resnik, 2013), stance markers have been used to show the development of undergraduate writers (Aull & Lancaster, 2014), and parse trees have been used to measure the syntactic complexity of student writing (Lu, 2010).Research Questions: What computational features are useful for analyzing the development of student writers? Based on these features, what insights can we gain into undergraduate writing at Stanford and similar institutions?Methodology: To extract topic features, we use LDA topic modeling (Blei, Ng, & Jordan, 2003) with Gibbs Sampling (Griffiths, 2002). To extract stance features, we replicate the stance markers approach from a past study (Aull & Lancaster, 2014). To describe sentence structure, we use parse trees generated using Shift-Reduce dependency parsing (Sagae & Tsujii, 2008). For each parse tree, we use the tree depth and the average dependency length as heuristics for the syntactic complexity of the sentence.Results: Topic modeling was useful for sorting papers into academic disciplines, as well as for distinguishing between argumentative and personal writing. Stance markers helped us characterize the intersection between the majors that students hold and the topics that they are writing about at a given time. Parse tree complexity demonstrated differences between writing in different disciplines. In addition, we found that students of different disciplines have different syntactic features even during their first year at Stanford.Discussion: Topic modeling has given us a picture of interdisciplinary study at Stanford by showing how often students in the SSW wrote about topics outside their majors. Furthermore, studying interdisciplinary Stanford students allowed us to examine the intersection of a student’s major and current topic of writing when analyzing the other two sets of features. Stance markers in the SSW show that both field of study and topic of writing influence the ways in which students employ metadiscourse. In addition, when looking at stance across years, we see that Seniors regress towards their First-Year habits. The complexity results raise the question of whether different disciplines have different “ideal” levels of writing complexity.Conclusions: The present study yields insight into undergraduate writing at Stanford in particular. Notably, we find that students develop most as writers during their first two years and that students of different majors develop as writers in different ways. We consider our three categories of features to be useful because they were able to give us these insights into the dataset. We hope that, moving forward, educators will be able to use this kind of analysis to understand how their students are developing as writers.
-
Abstract
Background: Employing natural language processing and latent semantic analysis, the current work was completed as a constituent part of a larger research project for designing and launching artificial intelligence in the form of deep artificial neural networks. The models were evaluated on a proprietary corpus retrieved from a data warehouse, where it was extracted from MyReviewers, a sophisticated web application purposed for peer review in written communication, which was actively used in several higher education institutions. The corpus of laboratory reports in STEM annotated by instructors and students was used to train the models. Under the Common Rule, research ethics were ensured by protecting the privacy of subjects and maintaining the confidentiality of data, which mandated corpus de-identification.Literature Review: De-identification and pseudonymization of textual data remains an actively studied research question for several decades. Its importance is stipulated by numerous laws and regulations in the United States and internationally with HIPAA Privacy Rule and FERPA.Research Question: Text de-identification requires a significant amount of manual post-processing for eliminating faculty and student names. This work investigated automated and semi-automated methods for de-identifying student and faculty entities while preserving author names in cited sources and reference lists. It was hypothesized that a natural language processing toolkit and an artificial neural network model with named entity recognition capabilities would facilitate text processing and reduce the amount of manual labor required for post-processing after matching essays to a list of users’ names. The suggested techniques were applied with supplied pre-trained models without additional tagging and training. The goal of the study was to evaluate three approaches and find the most efficient one among those using a users’ list, a named entity recognition toolkit, and an artificial neural network.Research Methodology: The current work studied de-identification of STEM laboratory reports and evaluated the performance of the three techniques: brute forth search with a user lists, named entity recognition with the OpenNLP machine learning toolkit, and NeuroNER, an artificial neural network for named entity recognition built on the TensorFlow platform. The complexity of the given task was determined by the dilemma, where names belonging to students, instructors, or teaching assistants must be removed, while the rest of the names (e.g., authors of referenced papers) must be preserved.Results: The evaluation of the three selected methods demonstrated that automating de-identification of STEM lab reports is not possible in the setting, when named entity recognition methods are employed with pre-trained models. The highest results were achieved by the users’ list technique with 0.79 precision, 0.75 recall, and 0.77 F1 measure, which significantly outweighed OpenNLP with 0.06 precision, 0.14 recall, and 0.09 F1, and NeuroNER with 0.14 precision, 0.56 recall, and 0.23 F1.Discussion: Low performance of OpenNLP and NeuroNER toolkits was explained by the complexity of the task and unattainability of customized models due to imposed time constraints. An approach for masking possible de-identification errors is suggested.Conclusion: Unlike multiple cases described in the related work, de-identification of laboratory reports in STEM remained a non-trivial labor-intensive task. Applied out of the box, a machine learning toolkit and an artificial neural network technique did not enhance performance of the brute forth approach based on user list matching.Directions for Future Research: Customized tagging and training on the STEM corpus were presumed to advance outcomes of machine learning and predominantly artificial intelligence methods. Application of other natural language toolkits may lead to deducing a more effective solution.
January 2017
-
Abstract
Background: Contemporary research in composition studies emphasizes the constitutive power of genres. It also highlights the prevalence of the most common genre in students’ transition into advanced college writing, the argumentative essay. Consistent with most research in composition, and therefore most studies of general, first-year college writing, such research has primarily emphasized genre context. Other research, in international applied linguistics research and particularly English for Academic Purposes (EAP), has focused less on first-year writers but has likewise shown the frequent use of argumentative essays in undergraduate writing. Together, these studies suggest that the argumentative essay is represented more than other genres in early college writing development, and that any given genre favors particular discourse features in contrast with other genres students might write. A productive next step, but one not yet realized, is to bring these discussions together, in research that uses context-informed corpus analysis that investigates students’ assignment contexts and analyzes the discourse that characterizes the tasks and genres students write. This study offers an exploratory, context-informed analysis of argumentative and explanatory writing by first-year college writers. Based on the corpus findings, the article underscores discourse as an integral part of the sociocognitive practices embedded in genres, and accordingly considers new ways to conceptualize student writing genres and to inform instruction and assignment design. Research questions: Four questions guided the inquiry: What are the key discursive practices associated with annotated bibliographies and argumentative essays written by the same students in the same course? What are the key discursive practices associated with visual analyses and argumentative essays written by the same students in the same course? What are the key discursive practices associated with the two argumentative tasks in comparison with the two explanatory tasks? Finally, how might corpus-based findings inform the design of particular assignment tasks and genres in light of a range of writing goals? Methodology: The article outlines a context-informed corpus analysis of lexical and grammatical keywords in part-of-speech tagged writing by first-year college students across courses at a U.S. institution. Using information from assignment descriptions and rubrics, the study considers four projects that also represent two macro-genres: an annotated bibliography and a visual analysis, both part of the explanatory macro-genre, and two argumentative essays, both part of the argumentative macro-genre. Results: The corpus analysis identifies lexical and grammatical keywords in each of the four tasks as well as in the macro-genres of argumentative versus explanatory writing. These include generalized, interpersonal, and persuasive discourse in argumentative essays versus more specified, informational, and elaborated discourse in explanatory writing, regardless of course or task. Based on these findings, the article discusses the discursive practices prioritized in each task and each macro-genre. Conclusions: The findings, based on key discourse patterns in tasks within the same course and in macro-genres across courses, pose important questions regarding writing task design and students’ adaptation to different genres. The macro-genre keywords specifically inform exploratory sociocognitive “profiles” of argumentative and explanatory tasks, offered in the final section. These argument and explanation profiles strive to account for discourse patterns, genre networks, and purposes and processes—in other words, multiple aspects of habituated thinking and writing practices entailed in each one relative to the other. As discussed in the conclusion, the profiles aim to (1) underscore discourse patterns as integral to the work of genres, (2) highlight adaptive discourse strategies as part of students’ meta-language for writing, and (3) identify multiple, macro-level (e.g., audience), meso-level (paragraph- and section-level), and micro-level (e.g., discourse patterns) aspects of genres to help instructors identify and specify multiple goals for writing assignments.