Journal of Writing Analytics
5 articlesJanuary 2026
January 2024
January 2018
-
Abstract
The Writing Mentor TM (WM) application is a Google Docs add-on designed to help students improve their writing in a principled manner and to promote their writing success in postsecondary settings. WM provides automated writing evaluation (AWE) feedback using natural language processing (NLP) methods and linguistic resources. AWE features in WM have been informed by research about postsecondary student writers often classified as developmental (Burstein et al., 2016b), and these features address a breadth of writing sub-constructs (including use of sources, claims, and evidence; topic development; coherence; and knowledge of English conventions). Through an optional entry survey, WM collects self-efficacy data about writing and English language status from users. Tool perceptions are collected from users through an optional exit survey. Informed by language arts models consistent with the Common Core State Standards Initiative and valued by the writing studies community, WM takes initial steps to integrate the reading and writing process by offering a range of textual features, including vocabulary support, intended to help users to understand unfamiliar vocabulary in coursework reading texts. This paper describes WM and provides discussion of descriptive evaluations from an Amazon Mechanical Turk (AMT) usability task situated in WM and from users-in-the-wild data. The paper concludes with a framework for developing writing feedback and analytics technology.
January 2017
-
Abstract
Aim: This research note focuses on some of the consequences of big data as an emerging methodology. Its purpose is to provide a brief literature review of the method’s development and some of the critical questions researchers should consider as they move forward. Salvo (2012) contends that big data as a form of design of communication itself “is necessarily a rhetorically-based field” (p. 38). With big data as an up and coming methodology (McNely, 2012; Salvo, 2012), using caution in its application is a necessity for scholars. Not only should researchers seek out the unseen and untapped applications of big data, but they should learn its limitations as well (Spinuzzi, 2009). You adopt a methodology, you adopt its flaws. Problem Formation: This section identifies a gap in the field as it relates to some of the consequences of applying big data as a methodology and seeing it as a rhetorical tool. As big data gains steam in the field of humanities, some are sure to question what they see as a flaw: the act of quantifying language. This argument is not new nor is its rebuttal. Harris (1954) discusses the distributional structure of language with each part of a sentence acting as co-occurents, each in a particular position, and each with a relationship to the other co-occurents (p. 146). Salvo (2012) argues that the combination of these new methodologies and technologies “knits together invention, arrangement, style, memory, and delivery in ways that challenge conceptions of print based literacy and textuality” (p. 39). While big data itself has several rhetorical methodologies embedded within, deciding which one to use depends on the amount of data and how it’s aggregated. • Information Collection: As described above, this research note functions primarily as a brief review of literature. This section focuses on how writing analytics developed from content analysis in mass communications and shifted into latent semantic analysis assisted by computer technology. Riffe, Lacy, & Fico (1995) offer a clear explanation of content analysis, which was developed with comparably small data sets in mind: “Usually, but not always, content analysis involves drawing representative samples of content, training coders to use the category rules developed to measure or reflect differences in content, and measuring reliability (agreement or stability over time) of coders applying the rules” (p. 2). Finding a representative sample of content was once a more feasible methodology, but in the digital age that amount of content exponentially increases every day. Conclusions: As latent semantic analysis is an extension of quantitative content analysis (and vice versa)—and knowing that an adopted methodology carries adopted flaws—it makes sense to turn to some of the concerns voiced by mass communication scholars in order to understand limitations. While quantitative content analysis grew in popularity in mass communication, so did the refining of its methods. Reporting the reliability of a study adds credibility to the study itself, and when a human coder is involved, the reporting of this intercoder reliability becomes imperative (Hayes & Krippendorf, 2007; Krippendorf, 2008, 2011). While intercoder reliability measures the degree to which coders agree, researchers should also be keenly aware of the theory and valence informing their study, which impacts their coders, which ultimately impacts the results of the study itself. Directions for Further Research: As the field of writing studies begins to adopt big data methodologies, researchers must continue to challenge and question their applications, implementations, and implications, turning to familiar questions from our own fields. Big data is exciting and new, but it’s not the methodology to explain it all. It’s just as rhetorical as every other methodology—it’s just better at hiding it.
-
Abstract
Background: While it is commonly recognized that almost every work and research discipline utilize their own taxonomy, the language used within a specific discipline may also vary depending on numerous factors, including the desired effect of the information being communicated and the intended audience. Different audiences are reached through publication of information, including research results, in different types of publication outlets such as newspapers, newsletters, magazines, websites, and journals. Prior research has shown that students, both undergraduate and graduate, as well as faculty may have a difficult time locating information in different publication outlet types (e.g., magazines, newspapers, journals). The type of publication may affect the ease of understanding and also the confidence placed in the acquired information. A text analytics tool for classifying the source of research as a newsletter (used as a substitute for newspaper articles), a magazine, or an academic journal article has been developed to assist students, faculty, and researchers in identifying the likely source type of information and classifying their own writings with respect to these possible publication outlet types. Literature Review: Literature on information literacy is discussed as this forms the motivation for the reported research. Additionally, prior research on using text mining and text analytics is examined to better understand the methodology employed, including a review of the original Scale of Theoretical and Applied Research system, adapted for the current research. Research Questions: The primary research question is: Can a text mining and text analytics approach accurately determine the most probable publication source type with respect to being from a newsletter, magazine, or journal? Methodology: A text mining and text analytics algorithm, STAR’ (System for Text Analytics-based Ranking), was developed from a previously researched text mining tool, STAR (Scale of Theoretical and Applied Research), that was used to classify the research type of articles between theoretical and applied research. The new text mining method, STAR’, analyzes the language used in manuscripts to determine the type of publication. This method first mines all words from corresponding publication source types to determine a keyword corpus. The corpus is then used in a text analytics process to classify full newsletters, magazine articles, and journal articles with respect to their publication source. All newsletters, magazine articles, and journal articles are from the library and information sciences (LIS) domain. Results: The STAR’ text analytics method was evaluated as a proof of concept on a specific LIS organizational newsletter, as well as articles from a single LIS magazine and a single LIS journal. STAR’ was able to classify the newsletters, magazine articles, and journal articles with 100% accuracy. Random samples from another similar LIS newsletter and a different LIS journal were also evaluated to examine the robustness of the STAR’ method in the initial proof of concept. Following the positive results of the proof of concept, additional journal, magazine, and newsletter articles were used to evaluate the generalizability of STAR’. The second-round results were very positive for differentiating journals and newsletters from other publication types, but revealed potential issues for distinguishing magazine articles from other types of publications. Discussion: STAR’ demonstrates that the language used for transferring information within a specific discipline does differ significantly depending on the intended recipients of the research knowledge. Further work is needed to examine language usage specific to magazine articles. Conclusions: The STAR’ method may be used by students and faculty to identify the likely source of research or discipline-specific information. This may improve trust in the reliability of information due to different levels of rigor applied to different types of publications. Additionally, the STAR’ classifications may be used by students, faculty, or researchers to determine the most appropriate type of outlet and correspondingly the most appropriate type of audience for the reported information in their own manuscripts, thereby improving the chance for successful sharing of information to appropriate audiences who will deem the information to be reliable, through publication in the most relevant outlet type.