Journal of Writing Research

39 articles
Year: Topic: Clear
Export:
assessment ×

February 2026

  1. Empirical studies of writing and generative AI: Introduction to the special issue
    Abstract

    This special issue of the Journal of Writing Research brings together seven empirical studies of the relationship between writing and generative AI, examining what can be systematically observed and measured about the functioning of generative AI in educational and professional writing contexts. Collectively, the studies demonstrate the necessity and value of methodological pluralism for investigating a complex, rapidly evolving phenomenon. In their contributions, the researchers use experimental comparisons, mixed-methods intervention designs, corpus-based analyses, computational linguistic techniques, and qualitative interpretive approaches. Taken together, these methods enable lines of inquiry that no single approach could sustain: comparisons of AI and human performance in professional writing tasks; analyses of how writers at different ages and levels of expertise engage AI tools; examinations of how assessment systems register and respond to AI-generated prose; and investigations of how human readers interpret texts with ambiguous authorship. By foregrounding both the affordances and limitations of different methodological traditions, the articles present a multifaceted approach to the study of writing and generative AI.

    doi:10.17239/jowr-2026.17.03.01
  2. Using AI to understand students’ self-assessments of their writing
    Abstract

    This study focuses on a generative AI approach to facilitate qualitative analysis in Writing Studies research. We gathered 13,336 one-sentence to one-paragraph responses written by 3,334 incoming students in a directed self-placement program administered at a large R1 U.S. university. In these responses, students describe their high school writing experience and college writing expectations. In stage one of the project, we pilot the use of Retrieval-Augmented Generation to expedite the selection of relevant responses for a topic—in this case, students’ positive self-assessments as writers. The selected responses were then compared to a random sample and rated by three faculty with writing expertise. In stage two, these faculty generated codes and themes from a subset of the responses, incorporating ChatGPT-4 through the stages of thematic analysis. Results show that the use of AI expedites and enhances qualitative analysis, but human participation in the process is still essential. We suggest a machine-in-the-loop framework with which Writing Studies researchers can more readily integrate generative AI to study large corpora of student writing.

    doi:10.17239/jowr-2026.17.03.07

June 2025

  1. The KLiCKe corpus: Keystroke logging in compositions for knowledge evaluation
    Abstract

    Despite the growing interest in the dynamics of the writing process in writing research, publicly available large-scale corpora of keystroke logs have been rare. We introduce KLiCKe, a freely available collection of keystroke logs for around 5,000 argumentative texts written by adults in the United States. The KLiCKe corpus also includes human-rated holistic scores for the essays as well as writers' demographic details, their typing skills, and vocabulary knowledge. We describe our methods for constructing the corpus and present descriptives for different components of the corpus. To illustrate the use of the KLiCKe corpus, we report a study using a subset of the corpus to investigate whether keystroke features are associated with holistic writing quality for L1 and L2 writers. The study shows that higher writing scores are related to shorter pauses in general, shorter between-word pauses, lower proportion of deletions, higher proportion of insertions, and less process variance. The KLiCKe corpus provides a robust resource for researchers to study the dynamics of text production and revision that will help spur the development of process-oriented tools and methodologies in writing assessment and instruction.

    doi:10.17239/jowr-2025.17.01.02

August 2023

  1. Comparative approaches to the assessment of writing: Reliability and validity of benchmark rating and comparative judgement
    Abstract

    In the past years, comparative assessment approaches have gained ground as a viable method to assess text quality. Instead of providing absolute scores to a text as in holistic or analytic scoring methods, raters in comparative assessments rate text quality by comparing texts either to pre-selected benchmarks representing different levels of writing quality (i.e., benchmark rating method) or by a series of pairwise comparisons to other texts in the sample (i.e., comparative judgement; CJ). In the present study, text quality scores from the benchmarking method and CJ are compared in terms of their reliability, convergent validity and scoring distribution. Results show that benchmark ratings and CJ-ratings were highly consistent and converged to the same construct of text quality. However, the distribution of benchmark ratings showed a central tendency. It is discussed how both methods can be integrated and used such that writing can be assessed reliably, validly, but also efficiently in both writing research and practice.

    doi:10.17239/jowr-2024.15.03.03

June 2023

  1. Work descriptions written by third-graders: An aspect of disciplinary literacy in primary craft education education
    Abstract

    This study focuses on disciplinary literacy in primary craft education. Disciplinary literacy refers to the specialised ways of reading, writing, and speaking in a particular discipline. In Finland, crafts is an obligatory school subject, and pupils are supposed to conceive and manage a complete crafts process, including documentation. However, disciplinary literacy in crafts has rarely been studied, let alone at the primary level. In this study, we explored the quality of a sample of work descriptions produced by third-graders. The data included digitally produced work descriptions (N=79) written by 42 third-grade pupils in a Finnish primary school. Based on a qualitative analysis, six main dimensions of work descriptions as a textual genre emerged: word count, crafts vocabulary, structure, spelling, multimodality, and self-assessment. The quality of work descriptions was analysed quantitatively according to scoring criteria based on these dimensions. A cluster analysis indicated that there were three groups of work descriptions with respect to their level of disciplinarity: limited, emerging, and advanced descriptions. The results show that the structure of the disciplinary texts develops first, and subject-specific vocabulary stabilises after that. The paper discusses the foundation for disciplinary literacy in primary craft education.

    doi:10.17239/jowr-2023.15.01.02

March 2023

  1. Efficient measurement of writing knowledge with forced-choice tasks: Preliminary data using the student knowledge of writing tests
    Abstract

    Much of the research that has examined the writing knowledge of school-age students has relied on interviews to ascertain this information, which is problematic because interviews may underestimate breadth and depth of writing knowledge, require lengthy interactions with participants, and do not permit a direct evaluation of a prescribed array of constituent knowledge elements. For these reasons, our goal in this study is to report the development, piloting, and field testing, using a sample of 335 students from grades 4 and 5, of four alternate versions of a writing knowledge assessment—the Student Knowledge of Writing Test (SKOWT)—that uses forced-choice responses to evaluate students’ knowledge of writing processes, genre elements, and linguistic features of written language. All versions of the SKOWT demonstrated adequate internal consistency reliability and construct validity based on exploratory factor analyses following deletion of some items. In addition, there was acceptable predictive criterion validity based on associations of SKOWT scores with subtests from the Test of Written Language-4 and measures of narrative, opinion, and informative essay quality. We discuss how the SKOWT might be used in future research and educational practice.

    doi:10.17239/jowr-2023.15.02.06

February 2023

  1. How Prior Information from National Assessments can be used when Designing Experimental Studies without a Control Group
    Abstract

    National assessments yield a description of the proficiency level in a domain while accounting for differences between tasks. For instance, in writing assessments the level of proficiency is typically evaluated with a variety of topics and multiple tasks. This enables generalizations from specific tasks to a domain. In (quasi-)experimental research, however, writing skills are often evaluated with a single task. Yet, conclusions about the effectiveness of the treatment are formulated on the level of the domain, which is, euphemistically put, quite a stretch. Although conclusions drawn about the effect of the treatment are specific to the task administered, they are often generalized to the domain without any form of reservation. This raises the question whether we can use the results of national assessments about differences between tasks in the analyses of experimental studies. In this paper, we demonstrate how the information of a baseline data set can be used as a kind of control condition in the analysis of an experimental study.

    doi:10.17239/jowr-2023.14.03.05

June 2022

  1. Baseline assessment in writing research: A case study of popularization discourse in first-year undergraduate students
    Abstract

    In popularization discourse, insights from academic discourse are recontextualized and reformulated into newsworthy, understandable knowledge for a lay audience. Training in popularization discourse is a relatively new and unexplored research topic. Existing studies in the science communication field suffer from under-utilized baseline assessments and pretests in teaching interventions. This methodological problem leads both to a lack of evidence for claims about student progress and to a gap in knowledge about baseline popularization skills. We draw the topic into the realm of writing research by conducting a baseline assessment of pre-training popularization skills in first-year undergraduate students. Undergraduate science communication texts are analyzed to identify instances of popularization strategies using a coding scheme for text analysis of popularization discourse. The results indicate a lack of genre knowledge in both academic and popularized discourse: textual styles are either too academic or overly popularized; the academic text is misrepresented; and the essential journalistic structure lacking. An educational program in popularization discourse should therefore focus on the genre demands of popularization discourse, awareness of academic writing conventions, the genre change between academic and popularized writing, the role of the student as a writer, and stylistic attributes.

    doi:10.17239/jowr-2022.14.01.02

June 2020

  1. Implementing Automated Writing Evaluation in Different Instructional Contexts: A Mixed-Methods Study
    Abstract

    There is increasing evidence that automated writing evaluation (AWE) systems support the teaching and learning of writing in meaningful ways. However, a dearth of research has explored ways that AWE may be integrated within different instructional contexts and examined the associated effects on students’ writing performance. This paper describes the AWE system MI Write and presents results of a mixed-methods study that investigated the integration and implementation of AWE with writing instruction at the middle-school level, examining AWE integration within both a traditional process approach to writing instruction and with strategy instruction based on the Self-Regulated Strategy Development model. Both instructional contexts were evaluated with respect to fostering growth in students’ first-draft writing quality across successive essays as well as students’ and teachers’ experiences and perceptions of teaching and learning with AWE. Multilevel model analyses indicated that during an eight-week intervention students in both instructional contexts exhibited growth in first-draft writing performance and at comparable rates. Qualitative analyses of interview data revealed that AWE’s influence on instruction was similar across contexts; specifically, the introduction of AWE resulted in both instructional contexts taking on characteristics consistent with a framework for deliberate practice.

    doi:10.17239/jowr-2020.12.01.04

February 2020

  1. Applying group dynamic assessment procedures to support EFL writing development: Students’ and teachers’ perceptions in focus
    Abstract

    The present study investigated the effects of applying cumulative group dynamic assessment (G-DA) procedures (Poehner, 2009) to support EFL writing development in a university context in Iran. It focused on learner achievement, patterns of occurrence of mediation incidents, and learners’ and teachers’ perceptions towards G-DA. Quantitative data was collected from learners’ performance on writing tests and the frequency of occurrence of mediation incidents involving EFL writing components based on Jacobs, Zinkgraf, Wormouth, Hartfield, and Hughey’s (1981) scale. Findings revealed that G-DA was more effective than conventional explicit intervention for supporting EFL writing development. Also, it worked best for low ability learners as compared to mid and high ability ones. Besides, the number of mediation incidents declined from 27 in session one to 8 in the final session, confirming the efficacy of G-DA in promoting both EFL writing and learner self-regulation. Most teacher mediation involved language use, vocabulary, and organization and fewer incidents involved content and mechanics. Qualitative data analysis indicated that most learners and teachers held positive attitudes towards the efficacy of G-DA for supporting EFL writing development. However, a few participants asserted that the procedures were unsystematic, stressful, time consuming, and inappropriate for large classes.

    doi:10.17239/jowr-2020.11.03.02

October 2019

  1. Using human judgments to examine the validity of automated grammar, syntax, and mechanical errors in writing
    Abstract

    This study introduces GAMET, which was developed to help writing researchers examine the types and percentages of structural and mechanical errors in texts. GAMET is a desktop application that expands LanguageTool v3.2 through a user-friendly, graphic user interface that affords the automatic assessment of writing samples for structural and mechanical errors. GAMET is freely available, works on a variety of operating systems, affords document batch processing, and groups errors into a number of structural and mechanical error categories. This study also tests LanguageTool’s validity using hand-coded assessment for accuracy and meaningfulness on first language (L1) and second language (L2) writing corpora. The study also examines how well LanguageTool replicates human coding of structural and mechanical errors in an L1 corpus as well as assesses associations between GAMET and human ratings of essay quality. Results indicate that LanguageTool can be used to successful locate errors within text. However, while the accuracy of LanguageTool is high, the recall of errors is low, especially in terms of punctuation errors. Nevertheless, the errors coded by LanguageTool show significant correlations with human ratings of writing and grammar and mechanics errors. Overall, the results indicate that while LanguageTool fails to flag a number of errors, the errors flagged provide an accurate profile of the structural and mechanical errors made by writers.

    doi:10.17239/jowr-2019.11.02.01
  2. Assessment of Authorial Voice Strength in L2 Argumentative Written Task Performances: Contributions of Voice Components to Text Quality
    Abstract

    The purpose of this study was twofold: (a) to examine the level of authorial voice strength among Iranian second language (L2) writers; and (b) to investigate the relationship between L2 learners’ authorial voice strength and the quality of their argumentative written task performances. Argumentative writing samples were elicited from 129 upper-intermediate L2 learners in writing courses. To quantify learners’ voice strength, these samples were scored by two raters using an analytic voice rubric. Raters also provided a holistic rating of the overall authorial voice strength in written argumentations. The quality of argumentations was measured using the TOEFL scoring rubric. While descriptive results indicated that learners demonstrated a low level of voice strength in their argumentations, results from Multiple Regression Analysis (MRA) suggested positive associations between voice strength along with two of its dimensions and the quality of writings. Moreover, results from Multiple Correspondence Analysis (MCA) pointed to the association of low and mid-level of writing quality and low voice strength, and the prevalence of high and mid voice strength in learners with high proficiency in writing. Finally, while an Item Response Theory (IRT) analysis revealed that the ‘presence’ dimension of authorial voice was the most difficult one for L2 learners, a Differential Item Functioning (DIF) analysis showed that the difficulty of the three voice dimensions did not differ significantly across genders. The findings were discussed regarding English L2 writing within the Iranian context.

    doi:10.17239/jowr-2019.11.02.04

June 2019

  1. Textual Mediation in Simulated Nursing Handoffs: Examining How Student Writing Coordinates Action
    Abstract

    In clinical nursing simulations, a group of students provide care for a robotic patient during a structured scenario. As care is transferred from one group to another, they participate in a patient handoff, with outgoing students passing key information onto incoming students. In healthcare, the nursing handoff is a critical and perilous communication moment that is mediated by a range of participants and texts. Drawing on observations and video recordings of 52 simulation handoffs in the United States, this article examines how two student-designed texts – a collaborative patient chart and individual notes – are leveraged during the handoff. I also consider how handoff talk and writing changes as student nursing knowledge increases over the course of a year. By focusing on textual mediation of the simulated nursing handoff, this article contributes to existing research on professional writing pedagogy and to nursing scholarship on the handoff. Ultimately, it argues that a textual mediation framework can help bridge classroom and professional contexts by evaluating student writing not for how successfully it meets a set of imposed criteria but for how effectively it supports classroom activities.

    doi:10.17239/jowr-2019.11.01.03

October 2018

  1. Scaling up Graduate Writing Workshops: From needs assessment to teaching practices
    Abstract

    Graduate students often encounter obstacles related to written science communication that can set them back in their path towards degree completion. Efforts to support these students should be informed by what they actually need or desire; yet oftentimes, programs are developed based on assumptions or intuitions. In other cases, proven models from literature are used to develop programs; however, due to a lack of justification for approaches and vague descriptions of daily teaching and learning activities, the intricacies of design are relatively unknown. Thus, in institutes looking to establish research writing resources or build on existing infrastructure, more research is needed to demonstrate how needs assessment can directly transfer to program development. In this paper, I describe how findings from a campus-wide needs assessment of graduate students (N = 310) and faculty (N = 111) informed the development of design principles for a week-long dissertation writing workshop. The complete description of the intervention, including how main elements and content align with socio-cognitive perspectives to writing, can facilitate replication; theory building; and communication about effective writing instruction. This work also offers a springboard for future research and program development and establishes a blueprint.

    doi:10.17239/jowr-2018.10.02.07
  2. Describing multifaceted writing interventions: From design principles for the focus and mode of instruction to student and teacher activities
    Abstract

    To enable a proper evaluation of the results of writing interventions for scientific replication and theory building, it is of vital importance that the design principles underlying an intervention and operationalization thereof are clearly described. A detailed description of a writing intervention is also important from a practical point of view, to foster dissemination and successful implementation of the intervention into practice. In this paper we propose a framework for reporting on the design principles of multifaceted intervention programs in a systematic manner. Unique features of this framework are that we (1) separate the design principles for the focus and mode of instruction, (2) systematically describe how these principles are integrated and operationalized into learning and teaching activities, (3) systematically describe the professional development teachers need to be able to execute the teaching activities. We demonstrate how this framework can be applied, with a worked example of an intervention that we designed, implemented and tested in elementary schools in the Netherlands. The framework provided in this paper makes core features of writing interventions transparent to reviewers, other scholars, and educational practitioners, and warrants that an intervention includes all necessary elements in the most optimal way. Moreover, this type of framework facilitates the comparison of interventions across contexts and countries.

    doi:10.17239/jowr-2018.10.02.03

June 2018

  1. Assessment of L2 Student Writing: Does Teacher Disciplinary Background Matter?
    Abstract

    This preliminary study examines the rating behavior of five composition and five ESL writing teachers while evaluating a text from a university-level non-native (L2) English speaking student. Using an eye tracker, we measured raters’ dwell times and reading behaviors across four areas of interest—rhetoric, organization, vocabulary, and grammar. Results indicate that raters with differing disciplinary backgrounds read the text differently. L2 writing teachers tended to spend more time on and re-read the rhetorical, lexical, and grammatical features of the text while skipping over more of the grammar errors, while composition teachers read the text more deliberately. The findings suggest L2 writing teachers were more prone to skim and scan for information on which to base a grade while composition teachers delayed rating decisions until after reviewing the entire text, which is corroborated in previous research. These findings can expand our understanding of how disciplinary background can influence rating processes, which can inform rater training procedures, especially in disciplinary writing contexts where L2 writing is judged by individuals with and without expertise in composition or second language writing. Moreover, it demonstrates the utility of eye-tracking methods to examine the cognitive processes associated with reading and scoring student writing.

    doi:10.17239/jowr-2018.10.01.01
  2. Persuasion by numbers: How does numeral marking of arguments in bad news letters influence persuasion?
    Abstract

    To what extent does numbering the reasons for a negative decision influence the persuasive force of the text? That is the focus of this study, in which we report an experiment (with 265 participants) wherein the direct effects and the indirect effects of numeral markings are analyzed in two linguistic contexts: in the introduction of the upcoming enumeration of reasons (the so-called ‘trigger’) and in the lead-ins of the successive reasons of the enumeration itself. The experiment was conducted within the framework of the Elaboration Likelihood Model (Petty and Cacioppo, 1984) and the Schematic Text Structural Expectations Hypothesis (Sanders and Noordman, 2000; Mulder, 2008). Adding numeral markers in both trigger and lead-ins turns out to enhance the persuasiveness of the text in several ways. It stimulates readers to elaborate more on the content of the reasons. It helps readers to scrutinize the reasons and stimulates recall, which contributes to a more balanced judgment. The markings also have a direct positive effect on persuasiveness, which points to an effect on low elaborating readers. Furthermore, inconsistent implementation of numeral markings (the combination of a numeral trigger with non-numeral lead-ins or a non-numeral trigger with numeral lead-ins) has a negative indirect effect on persuasiveness via text evaluation. This effect is explained by assuming that the Schematic Text Structural Expectations Hypothesis not only applies to text processing, but to text evaluation as well.

    doi:10.17239/jowr-2018.10.01.03

February 2018

  1. Effects of the Specificity and the Format of External Representations on Students’ Revisions of Fictitious Others’ Texts
    Abstract

    University students are often challenged with the demand of providing cohesive explanatory texts. To support students in revising their explanatory texts with regard to cohesion, it could be useful to provide students with external representations as formative feedback. In this study, we provided participants with a scenario in which they were asked to review a fictitious student’s draft containing several cohesion gaps. Additionally, participants received an external representation as feedback to support them during their revisions. We varied the format (concept map versus outline) and the specificity (general versus specific) of the provided external representations. We found that participants with specific concept map representations correctly noticed more cohesion gaps, and perceived less cognitive load during reviewing than participants with the specific outline representation. Students with general external representations showed the lowest performance on the noticing task and the highest amount of cognitive load. However, there were no differences among the external representations regarding the quality of students’ revisions. Evidently, specific concept maps can be regarded as a useful scaffold to support students’ evaluation processes. However, additional instructional support is needed, particularly for novice writers, to effectively revise expository texts for cohesion.

    doi:10.17239/jowr-2018.09.03.04

October 2017

  1. Improving Writing in Primary Schools through a Comprehensive Writing Program
    Abstract

    This study examined the effects of an innovative comprehensive writing program in upper primary education on students’ writing performance and on teachers´ classroom practices, beliefs and skills. The program focused on the communicative nature of writing, on writing as a process, and on explicit teaching of five genre-specific writing strategies. It was implemented by 43 teachers in their regular classrooms (Grades 4 to 6, N = 1052), with three conditions: (1) a writing program condition, (2) the same program complemented by professional development sessions and coaching, and (3) a control condition in which teachers taught their usual writing lessons. Students’ writing performance was measured three times with multiple writing tasks. Data on teachers’ practices, beliefs and skills were collected through lesson observations, interviews, questionnaires, teacher logs, and a text assessment task. The comprehensive writing program had a beneficial effect on students’ writing performance and the extent to which teachers taught writing strategies. The complementary professional development and coaching had a direct effect on the number of lessons implemented, and an indirect effect on students' performance. Overall, the innovation proved to be effective for improving students’ writing performance in the upper grades of primary schools.

    doi:10.17239/jowr-2017.09.02.04

June 2017

  1. Effects of transcription ability and transcription mode on translation: Evidence from written compositions, language bursts and pauses when students in grades 4 to 9, with and without persisting dyslexia or dysgraphia, compose by pen or by keyboard
    Abstract

    This study explored the effects of transcription on translation products and processes of adolescent students in grades 4 to 9 with and without persisting specific language disabilities in written language (SLDs-WL). To operationalize transcription ability (handwriting and spelling) and transcription mode (by pen on digital tablet or by standard US keyboard), diagnostic groups contrasting in patterns of transcription ability were compared while composing autobiographical (personal) narratives by handwriting or by keyboarding: Typically developing students (n=15), students with dyslexia (impaired word reading and spelling, n=20), and students with dysgraphia (impaired handwriting, n=19). They were compared on seven outcomes: total words composed, total composing time, words per minute, percent of spelling errors, average length of pauses, average number of pauses per minute, and average length of language bursts. They were also compared on automaticity of transcription modes-writing the alphabet from memory by handwriting or keyboarding (they could look at keys). Mixed ANOVAs yielded main effects for diagnostic group on percent of spelling errors,, words per minute, and length of language burst. Main effects for transcription modes were found for automaticity of writing modes, total words composed, words per minute, and length of language bursts; there were no significant interactions. Regardless of mode, the dyslexia group had more spelling errors, showed a slower rate of composing, and produced shorter language bursts than the typical group. The total number of words, total time composing, words composed per minute, and pauses per minute were greater for keyboarding than handwriting, but length of language bursts was greater for handwriting. Implications of these results for conceptual models of composing and educational assessment practices are discussed.

    doi:10.17239/jowr-2017.09.01.01
  2. Scaffolding tertiary students’ writing in a genre-based writing intervention.
    Abstract

    In recent years, embedding writing into subject teaching through genre-based writing instruction (GBWI) has been advocated in tertiary education. However, little is known about how this approach can be shaped and implemented in this context. In a design-based research study in Dutch higher professional education, we aimed to explore how GBWI can be used to scaffold students’ writing within the subject of Event Organization and to what extent students learned to use the typical features of the genre ‘event proposal’. A 5-week subject-specific writing intervention was designed and subsequently enacted by a subject lecturer in a first-year class involving 13 students. Using a coding scheme for interactional scaffolding strategies, five interaction fragments were analyzed against the background of designed scaffolding and learning goals. The fragments indicated that the interplay of designed scaffolding (instructional materials and activities) and interactional scaffolding (teacher-student interactions) promoted students’ writing performance over time. Comparison of students’ pre- and posttests by means of an analytic scoring scheme pointed to statistically significant growth in the use of typical genre features (d=1.41). Together, the results of this design-based research study indicate the potential of GBWI for scaffolding and promoting tertiary students’ writing.

    doi:10.17239/jowr-2017.09.01.02

February 2017

  1. A Synthesis of Mathematics Writing: Assessments, Interventions, and Surveys
    Abstract

    Mathematics standards in the United States describe communication as an essential part of mathematics. One outlet for communication is writing. To understand the mathematics writing of students, we conducted a synthesis to evaluate empirical research about mathematics writing. We identified 29 studies that included a mathematics-writing assessment, intervention, or survey for students in 1st through 12th grade. All studies were published between 1991 and 2015. The majority of assessments required students to write explanations to mathematical problems, and fewer than half scored student responses according to a rubric. Approximately half of the interventions involved the use of mathematics journals as an outlet for mathematics writing. Few intervention studies provided explicit direction on how to write in mathematics, and a small number of investigations provided statistical evidence of intervention efficacy. From the surveys, the majority of students expressed enjoyment when writing in mathematics settings but teachers reported using mathematics writing rarely. Across studies, findings indicate mathematics writing is used for a variety of purposes, but the quality of the studies is variable and more empirical research is needed.

    doi:10.17239/jowr-2017.08.03.04

October 2016

  1. Understanding the benefits of receiving peer feedback: A case of matching ability in peer-review
    Abstract

    Peer assessment is a technique with many possible benefits for instruction across the curriculum. However, the value obtained from receiving peer feedback may critically depend upon the relative abilities of the author and the reviewer. We develop a new model of such relative ability effects on peer assessment based on the well-supported Flower and Hayes model of revision processes. To test this model across the stages of peer assessment from initial text quality, reviewing content, revision amount, and revision quality, 189 undergraduate students in a large, introductory course context were randomly assigned to consistently receive feedback from higher-ability or lower-ability peers. Overall, there were few main effects of author ability or reviewer ability. Instead, as predicted, there were many interactions between the two factors, suggesting the new model is useful for understanding ability factors in peer assessment. Often lower-ability writers benefitted more from receiving feedback from lower-ability reviewers, while higher-ability writers benefitted equally from receiving feedback from lower-ability and higher-ability reviewers. This result leads to the practical recommendation of grouping students by ability during peer assessment, contrary to student beliefs that only feedback from high ability peers is worthwhile.

    doi:10.17239/jowr-2016.08.02.03

June 2016

  1. Student Use of Automated Essay Evaluation Technology During Revision
    Abstract

    The purpose of this study was to examine how six middle-school students used Automated Essay Evaluation (AEE) technology to revise their writing. Students in a combined 7th and 8th grade Literacy class at one school participated in two in-depth think alouds and semi-structured interviews as they used AEE technology to revise their writing on two separate writing tasks. Constant-comparative analysis of data, including think alouds, semi-structured interviews, and student writing along with a separate quantitative analysis of student revisions revealed themes in three areas: (a) student use of AEE feedback to make revisions; (b) student motivation to revise their writing when using AEE technology; (c) and student understanding and application of AEE feedback during revision. Findings indicated that students who received low scores used AEE feedback to prompt non-surface revisions whereas students with high scores did not. Further, students who used AEE feedback to prompt non-surface revisions made more overall non-surface revisions, revised for different reasons, made more t-unit level revisions, and had more revisions rated as major successes than students who did not use the feedback. Students who used the AEE feedback, MY Editor, were often confused by the grammar and punctuation feedback and had a low success rate using it. However, students were more successful with the spell checker only feedback. In addition, findings show that students were motivated to revise because of the numerical scores the technology assigned their writing. Moreover, knowledge that they would receive a score prompted students to do extensive revising prior to submitting their writing for scoring. Finally, student understanding of the AEE feedback was varied. Implications for classroom use of AEE technology and directions for future research are discussed.

    doi:10.17239/jowr-2016.08.01.05

February 2016

  1. Rhetorical Patterns in Citations across Disciplines and Levels of Participation
    Abstract

    Writing researchers have long attempted to classify and describe patterns of citation and source use both to describe disciplinary differences, and to identify discourse-level characteristics of new knowledge production. The analysis of large corpora has provided great insights about the formal characteristics of citations, but little information about their rhetorical nature, which we know from interview studies as central to the understanding of source use practices. This study reports on an attempt to understand and describe patterns of source use across disciplines, genres and levels of participation through systematic verbal data analysis of documents produced by sixteen participants in expert/novice pairs (faculty advisor/doctoral advisee) from four disciplines (Computer Science, Chemical Engineering, Materials Science Engineering and Humanities and Social Sciences). The results of this analysis showed that, despite some disciplinary differences, all participants used similar patterns of reference use, namely elaboration, evaluation and relation to one’s own work.

    doi:10.17239/jowr-2016.07.03.06

October 2015

  1. Teaching children to write: A meta-analysis of writing intervention research
    Abstract

    It has been established that in the Netherlands, as in other countries, a majority of students do not attain the desired level of writing skills at the end of elementary school. Time devoted to writing is limited, and only a minority of schools succeed in effectively teaching writing. An improvement in the way writing is taught in elementary school is clearly required. In order to identify effective instructional practices we conducted a meta-analysis of writing intervention studies aimed at grade 4 to 6 in a regular school setting. Average effect sizes were calculated for ten intervention categories: strategy instruction, text structure instruction, pre-writing activities, peer assistance, grammar instruction, feedback, evaluation, process approach, goal setting, and revision. Five of these categories yielded statistically significant results. Pairwise comparison of these categories revealed that goal setting (ES = 2.03) is the most effective intervention to improve students’ writing performance, followed by strategy instruction (ES = .96), text structure instruction (ES = .76), peer assistance (ES = .59), and feedback (ES = .88) respectively. Further research is needed to examine how these interventions can be implemented effectively in classrooms to improve elementary students’ writing performance.

    doi:10.17239/jowr-2015.07.02.2

May 2015

  1. Conditions for writing to learn
    Abstract

    This paper is a response to an invitation from the editors of the special issue to comment on the ingredients of effective writing to learn interventions as reflected in the contributions to the special issue. The six papers in the issue vary widely in approach and underlying theoretical frameworks but share the broad common theme of writing to learn. Within this, they vary along three main dimensions: (i) how learning is defined and assessed, and in particular whether they assess effects of the writing intervention on content knowledge; (ii) related to this, whether they are primarily focussed on discipline specific skills or on more general effects of writing; and (iii) whether they are designed to carry out a controlled evaluation of the writing intervention or rather are concerned with describing the design and purpose of a specific intervention. In what follows, I will first consider the general characteristics of the papers in relation to these three dimensions. I will then reflect on the findings of the individual papers, and then conclude by relating the papers to my personal understanding of writing to learn in terms of a dual-process model of writing.

    doi:10.17239/jowr-2015.07.01.09

February 2015

  1. Writing in Test and Non-test Situations: Process and Product
    Abstract

    Test writers sometimes complain they cannot perform to their true abilities because of time constraints. We therefore examined differences in terms of process and product between texts produced under test and non-test conditions. Ten L2 postgraduates wrote two argumentative essays, one under test conditions, with only forty minutes being allowed and without recourse to resources, and one under non-test conditions, with unlimited time as well as access to the Internet. Keystroke logging, screen capture software, and stimulated recall protocols were used, participants explaining and commenting on their writing processes. Sixteen writing process types were identified. Higher proportions of the processes of translation and surface revision were recorded in the test situation, while meaningful revision and evaluation were both higher in the non-test situation. There was a statistically significant difference between time allocation for different processes at different stages. Experienced teachers awarded the non-test texts a mean score of almost one point (0.8) higher. A correlational analysis examining the relationship between writing process and product quality showed that while the distribution of writing processes can have an impact on text quality in the test situation, it had no effect on the product in the non-test situation.

    doi:10.17239/jowr-2015.06.03.2

October 2013

  1. Book review Shermis, M.D. & Burstein, J. (Eds) (2013). Handbook of Automated Essay Evaluation: Current applications and new directions. Routledge: New York and London. ISBN-10: 0415810965
    doi:10.17239/jowr-2013.05.02.4

June 2013

  1. Evaluative misalignment of 10th-grade student and teacher criteria for essay quality: An automated textual analysis
    Abstract

    Writing is a necessary skill for success in the classroom and the workplace; yet, many students are failing to develop sufficient skills in this area. One potential problem may stem from a misalignment between students' and teachers' criteria for quality writing. According to the evaluative misalignment hypothesis, students assess their own writing using a different set of criteria from their teachers. In this study, the authors utilize automated textual analyses to examine potential misalignments between students' and teachers' evaluation criteria for writing quality. Specifically, the computational tools Coh-Metrix and Linguistic Inquiry and Word Count (LIWC) are used to examine the relationship between linguistic features and student and teacher ratings of students' prompt-based essays. The study included 126 students who wrote timed, SAT-style essays and assessed their own writing on a scale of 1-6. Teachers also evaluated the essays using the SAT rubric on a scale of 1-6. The results yielded empirical evidence for student-teacher misalignment and advanced our understanding of the nature of students' misalignments. Specifically, teachers were attuned to the linguistic features of the essays at both surface and deep levels of text, whereas students' ratings were related to fewer overall textual features and most closely associated with surface-level features.

    doi:10.17239/jowr-2013.05.01.2

February 2013

  1. Quantity versus quality. Effects of argumentation in bad news letters
    Abstract

    Do the quality and the quantity of arguments have an impact on the evaluation of bad news messages? To answer this question, two experiments were carried out using bad news letters in which the quality and the quantity of the arguments were manipulated in a contextually realistic way. The results of both experiments show that adding argumentation has a positive impact on the perceived politeness and the persuasive force of the letters. Furthermore, the studies show that the impact of strong arguments is greater than that of weak arguments. The effect of adding successive arguments is positive as well. However, the results indicate that one or two arguments are sufficient. Adding a third argument only minimally contributes to better evaluations.

    doi:10.17239/jowr-2013.04.03.4
  2. Read and think before you write: Prewriting time and level of print exposure as factors in writing and revision
    Abstract

    This study investigated situational and writer characteristics that influence the revision process. Thirty-four students who scored high on print exposure and 32 students who scored low on print exposure had 10 or 70 seconds to think about each of 2 prompts before beginning to write (prewriting time) the essays on a computer. A keystroke-logging program captured writing and editing behavior, including pauses, edits (deletions, substitutions, insertions), and prompt reviews. Quality was measured using an 8-factor, 3-point analytic scoring rubric. Results indicated that high print exposure students wrote longer and higher quality essays than low print exposure students. In addition, the short prewriting time increased prompt reviewing and average pause lengths. High and low print exposure writers showed differential responses to the prewriting time manipulation in terms of total pause-associated edits during writing. The complexity of the revision process and the importance of understanding multiple immediate variables in the writing situation are discussed.

    doi:10.17239/jowr-2013.04.03.1

November 2012

  1. Eliciting formative assessment in peer review
    Abstract

    Computer-supported peer review systems can support reviewers and authors in many different ways, including through the use of different kinds of reviewing criteria. It has become an increasingly important empirical question to determine whether reviewers are sensitive to different criteria and whether some kinds of criteria are more effective than others. In this work, we compared the differential effects of two types of rating prompts, each focused on a different set of criteria for evaluating writing: prompts that focus on domain-relevant aspects of writing composition versus prompts that focus on issues directly pertaining to the assigned problem and to the substantive issues under analysis. We found evidence that reviewers are sensitive to the differences between the two types of prompts, that reviewers distinguish among problem-specific issues but not among domain-writing ones; that both types of ratings correlate with instructor scores; and that problem-specific ratings are more likely to be helpful and informative to peer authors in that they are less redundant.

    doi:10.17239/jowr-2012.04.02.5

June 2012

  1. From reading to writing: Evaluating the Writer's Craft as a means of assessing school student writing.
    Abstract

    This article reports on part of a study investigating a new writing assessment, the Writer's Craft, which requires students to read a stimulus passage and then write a continuation adopting the style of the original. The article provides a detailed analysis of stimulus passages employed within this assessment scheme and students' written continuations of these passages. The findings reveal that this is a considerably more challenging assessment writing task than has previously been recognised; and that questions arise concerning the nature of the stimulus passages and the extent to which the assessment criteria captured what the students had achieved in their writing. The implications of these findings are discussed and recommendations are made.

    doi:10.17239/jowr-2012.04.01.1

March 2012

  1. Is it differences in language skills and working memory that account for girls being better at writing than boys?
    Abstract

    Girls are more likely to outperform boys in the development of writing skills. This study considered gender differences in language and working memory skills as a possible explanation for the differential rates of progress. Sixty-seven children (31 males and 36 females) (M age 57.30 months) participated. Qualitative differences in writing progress were examined using a writing assessment scale from the Early Years Foundation Stage Profile (EYFSP). Quantitative measures of writing: number of words, diversity of words, number of phrases/sentences and grammatical complexity of the phrases/sentences were also analysed. The children were also assessed on tasks measuring their language production and comprehension skills and the visuo-spatial, phonological, and central executive components of working memory. The results indicated that the boys were more likely to perform significantly less well than the girls on all measures of writing except the grammatical complexity of sentences. Initially, no significant differences were found on any of the measures of language ability. Further, no significant differences were found between the genders on the capacity and efficiency of their working memory functioning. However, hierarchical regressions revealed that the individual differences in gender and language ability, more specifically spoken language comprehension, predicted performance on the EYFSP writing scale. This finding accords well with the literature that suggests that language skills can mediate the variance in boys' and girls' writing ability.

    doi:10.17239/jowr-2012.03.03.5

December 2011

  1. Classifying paragraph types using linguistic features: Is paragraph positioning important?
    Abstract

    This study examines the potential for computational tools and human raters to classify paragraphs based on positioning. In this study, a corpus of 182 paragraphs was collected from student, argumentative essays. The paragraphs selected were initial, middle, and final paragraphs and their positioning related to introductory, body, and concluding paragraphs. The paragraphs were analyzed by the computational tool Coh-Metrix on a variety of linguistic features with correlates to textual cohesion and lexical sophistication and then modeled using statistical techniques. The paragraphs were also classified by human raters based on paragraph positioning. The performance of the reported model was well above chance and reported an accuracy of classification that was similar to human judgments of paragraph type (66% accuracy for human versus 65% accuracy for our model). The model’s accuracy increased when longer paragraphs that provided more linguistic coverage and paragraphs judged by human raters to be of higher quality were examined. The findings support the notions that paragraph types contain specific linguistic features that allow them to be distinguished from one another. The finding reported in this study should prove beneficial in classroom writing instruction and in automated writing assessment.

    doi:10.17239/jowr-2011.03.02.3

February 2011

  1. Writing in natural sciences: Understanding the effects of different types of reviewers on the writing
    Abstract

    In undergraduate natural science courses, two types of evaluators are commonly used to assess student writing: graduate-student teaching assistants (TAs) or peers. The current study examines how well these approaches to evaluation support student writing. These differences between the two possible evaluators are likely to affect multiple aspects of the writing process: first draft quality, amount and types of feedback provided, amount and types of revisions, and final draft quality. Therefore, we examined how these aspects of the writing process were affected when undergraduate students wrote papers to be evaluated by a group of peers versus their TA. Several interesting results were found. First, the quality of the students' first draft was greater when they were writing for their peers than when writing for their TA. In terms of feedback, students provided longer comments, and they also focused more on the prose than the TAs. Finally, more revisions were made if the students received feedback from their peers-especially prose revisions. Despite all of the benefits seen with peers as evaluators, there was only a moderate difference in final draft quality. This result indicates that while peer-review is helpful, there continues to be a need for research regarding how to enhance the benefits.

    doi:10.17239/jowr-2011.02.03.4

August 2010

  1. A dual purpose data base for research and diagnostic assessment of student writing
    Abstract

    The data base of writing examined serves a dual purpose. Here it is used as a research tool and the writing performance from the large, nationally representative sample (N = 20,947) of students (years 4 to 12) interrogated to examine patterns of performance in writing. However, the data base was designed to underpin a software tool for diagnostic assessment of writing. Viewing writing as accomplishing social communicative goals, performance was considered in terms of seven main purposes the writer may seek to achieve. Tasks related to each purpose were encapsulated in 60 writing prompts that included stimulus material. Participants produced one writing sample; the design ensured appropriate representation across writing purposes. Samples were scored using criteria differentiated according to purpose and curriculum level of schooling and acceptable reliability obtained. Analyses indicate that growth was most marked between years 8 and 10, arguably, as opportunity to write increases and writing is linked to learning in content areas. Variability in performance is relatively low at primary school and high at secondary school. Students at any level did not write equally well for different purposes. Mean scores across purposes at primary school were relatively similar with to instruct and to explain highest. By years 11-12 there is a considerable gap between the highest scores (for narrate and report) and the lowest, recount, reflecting likely opportunities to practice writing for different purposes. Although girls performed better than boys, the difference in mean scores narrows by years 11-12.

    doi:10.17239/jowr-2010.02.02.3

April 2010

  1. Subordinated clauses usage and assessment of syntactic maturity: A comparison of oral and written retellings in beginning writers
    Abstract

    The present longitudinal study aims to explore possible syntactic complexity differences between oral and written story retellings produced by Spanish speaking children at the end of the 1st and 2nd grades of primary education. It is assumed that differences between oral and written modalities can be found due in part to the cognitive demands of low level writing skills. Indeed, it has been observed that written texts produced by children are shorter and of lower quality than oral ones (Berninger, et al., , 1992; Berninger & Swanson,1994). However, how the transcription skills might constrain the syntactic complexity of children's written texts is not well established.The children (N=163) that participated in this study were attending three different schools located in Córdoba Province, Argentina. The children were examined at the end of the 1st and 2nd year of primary education. The oral and written retellings were analyzed using Length, T- unit number and Syntactic Complexity Index (SCI) (Hunt, 1965; 1970). The analysis of children's productions showed differences between grades and modalities. The differences between modalities were found in text Length and T-unit, but not in SCI. These results suggest that transcription skills do not affect syntactic performance. Nevertheless, a more detailed analysis revealed differences between groups. Possible restrictions of the original text on children's performance were also observed. The implications and the scope of the SCI and units used for the analysis are furthered discussed.

    doi:10.17239/jowr-2010.02.01.2