Assessing Writing
279 articlesApril 2024
-
Is the variation in syntactic complexity features observed in argumentative essays produced by B1 level EFL learners in Finland and Pakistan attributable exclusively to their L1? ↗
Abstract
This study has explored the syntactic complexity features of English learners at the B1 Common European Framework of Reference (CEFR) (CoE, 2001) level from both Pakistan and Finland. The learners in question were taught English as a Foreign Language (EFL) using different pedagogical methods. This study took into account various factors including the learners' proficiency level, age, and grade, as well as variations in their native language. To assess the impact of the learners' native language and pedagogical methods on syntactic complexity features, twelfth grade EFL students from Upper-Secondary schools in both nations were given identical instructions and time limits to complete an English academic essay on the same topic. The study utilized L2 syntactic complexity analyzer (L2SCA) to extract fourteen syntactic complexity features, and Mann-Whitney U Tests were used to analyze the differences in the syntactic complexity features between the two groups. The study has revealed significant differences between Finnish and Pakistani EFL learners due to variations in their native language and the effects of pedagogical methods on syntactic complexity features. The implications of this study extend to language testing and assessment, the CEFR framework, and pedagogy in both Finland and Pakistan.
-
Assessing writing and spelling interest and self-beliefs: Does the type of pictorial support affect first and third graders’ responses? ↗
Abstract
An array of pictorial supports (e.g., emojis, geometrical figures, animals) is often used in studies assessing young students’ writing motivation with Likert scales. However, although these images may influence the students’ responses, sufficient rationales for these choices are often absent from the studies. To the best of our knowledge, the present study is the first to investigate two different types of pictorial support (circles vs. faces) in Likert scales assessing first and third graders’ writing interest, self-concept, and spelling interest and self-efficacy. The samples consist of 2197 first graders (mean age 6.8 years) and 1740 third graders (mean age 8.4 years). Results show statistically significant differences among the scales indicating that when face-scales are used, first-graders skip motivation items more often, and students in both grades avoid the minimum values of the scale more often. Gender differences are also found indicating that when face-scales are used, boys in third grade avoid maximum values more often, and girls in both grades avoid the minimum values more often. These findings suggest that the use of circle-scales compared to face-scales seem more appropriate in scales measuring young students’ writing and spelling interest and self-beliefs.
-
Abstract
Research into the contribution of multimodality to language learning is gaining momentum. While most studies pave the way for new understandings of language teaching and learning, there is an increasing demand for comprehensive assessment practices, particularly within higher education contexts. A few studies have emphasized the importance of reflecting on and establishing criteria for the assessment of multimodal literacy. This is necessary to understand students’ contributions in detail and to provide them with effective support in developing their multimodal skills. This study discusses the assessment of multimodal writing in English for Specific Purposes (ESP) contexts. It presents the design of an analytical tool for assessing multimodal texts and provided an example of its application. This tool covers assessment categories such as language use, content expression, interpersonal meaning, multimodality, and creativity and originality. As an example, we focus on the multimodal writing of a video game narrative, a genre that requires the integration of multiple modes of communication to convey meaning more effectively. Finally, this study offers pedagogical insights into the assessment of multimodal literacy in ESP.
-
Characteristics of students’ task representation and its association with argumentative integrated writing performance ↗
Abstract
Task representation denotes students’ interpretation in which what a learning or assessment task required them to do. An argumentative integrated writing task which involves the use of reading materials as claims or evidences for composing an essay, makes the role of task representation more critical than others, as writers may be confused with whether their task is to focus on synthesizing the reading materials that they comprehend, or expressing their own views. With the aim of exploring the characteristics of task representation and its association with integrated writing, this study invited 474 secondary four students from Hong Kong to participate in think aloud writing protocol followed by stimulated recall interview (36 participants), and complete an integrated writing task and a questionnaire (438 participants). Three factors of the task representation were identified as source use, rhetorical purpose and text format, and significant positive correlations were found between the three factors and integrated writing performance. Theoretical and pedagogical implications are discussed.
January 2024
-
Abstract
The study reported in the paper starts with a hypothesis that errors observable in writing performances can account for much of the variability of the ratings awarded to them. The assertion is that this may be the case even when prescribed rating criteria explicitly direct rater focus towards successfully performed aspects of a writing performance rather than towards errors. The hypothesis is tested on a sample of texts rated independently of the study, using a five-point analytic rating scale involving ‘Can do’-like descriptors. The correlation between errors and ratings is ascertained using ordinal logistic regression, with Pseudo R2 of 0.51 discerned overall. Thus, with roughly 50% of score variability explainable by error occurrences, the stated hypothesis is considered confirmed. The study goes on to discuss the consequences of the findings and their potential employ in assessment of writing beyond the local assessment context.
October 2023
-
Abstract
With the objective of improving writing assessment of language instruction, we examine the lexical and syntactic features in two corpora of high and low scoring French texts of the Test du Certificat de Compétence en Langue Seconde (Second Language Certification Test; TCCLS) at the University of Ottawa (uOttawa). We first situate the test in its local context, demonstrating how our research objectives are born from specific needs to improve student outcomes. We then describe our creation of two corpora of high and low performing test takers, followed by lexical bundle (LB) analyses (Phase 1) and further linguistic complexity analyses with a French-language tool (Phase 2). Results indicate that high level writers used more LBs and borrowed more text from the prompt than low level writers. In addition, specific elements of linguistic complexity were identified, suggesting high level writers produced texts that were lexically richer and more syntactically advanced. We discuss the importance of these findings in improving our writing instruction, as well as the challenges of adapting tools and approaches traditionally associated with English to French.
July 2023
-
Abstract
Learning how to write occluded genres is an elusive task (Swales, 1996) – even more so in the case of students writing in a second or additional language. To achieve discourse competence in the use of one of these genres, in this case the ‘statement of purpose’ typical of post-graduate programme admission forms, it is first necessary to fully understand its features at both the macrotextual and microlinguistic levels (Gillaerts, 2003; Bhatia, 2004). This qualitative study focuses on the writing of learners of Spanish as an additional language to analyse whether feedback provided by peers impacts the quality of the statements of purpose they write. Through a dual discourse analysis of their written work and in-class interactions during peer- feedback sessions, our study finds that, when properly trained and using tailored assessment tools, students can use peer-assessment profitably to improve the quality of their statements of purpose, as well as to acquire appropriate metalanguage to guide others. Our results thus reconfirm the beneficial effects of helping students to achieve feedback literacy.
-
Beyond literacy and competency – The effects of raters’ perceived uncertainty on assessment of writing ↗
Abstract
This study investigated how common raters’ experiences of uncertainty in high-stakes testing are before, during, and after the rating of writing performances, what these feelings of uncertainty are, and what reasons might underlie such feelings. We also examined if uncertainty was related to raters’ rating experience or to the quality of their ratings. The data were gathered from the writing raters (n = 23) in the Finnish National Certificates of Proficiency, a standardized Finnish high-stakes language examination. The data comprise 12,118 ratings as well as raters’ survey responses and notes during rating sessions. The responses were analyzed by using thematic content analysis and the ratings by descriptive statistics and Many-Facets Rasch analyses. The results show that uncertainty is variable and individual, and that even highly experienced raters can feel unsure about (some of) their ratings. However, uncertainty was not related to rating quality (consistency or severity/leniency). Nor did uncertainty diminish with growing experience. Uncertainty during actual ratings was typically associated with the characteristics of the rated performances but also with other, more general and rater-related or situational factors. Other reasons external to the rating session were also identified for uncertainty, such as those related to the raters themselves. An analysis of the double-rated performances shows that although similar performance-related reasons seemed to cause uncertainty for different raters, their uncertainty was largely associated with different test-takers’ performances. While uncertainty can be seen as a natural part of holistic ratings in high-stakes tests, the study shows that even if uncertainty is not associated with the quality of ratings, we should constantly seek ways to address uncertainty in language testing, for example by developing rating scales and rater training. This may make raters’ work easier and less burdensome.
-
Abstract
Peerceptiv is a peer assessment tool developed by learning sciences researchers to help students demonstrate disciplinary knowledge through writing feedback practices. This review of Peerceptiv describes its key features while comparing it with other writing feedback tools and suggesting possibilities and limitations of using it to support AI-based online writing assessment across the disciplines. Future considerations regarding the use of Peerceptiv in assessing, teaching, and researching online writing are discussed.
April 2023
-
The design and cognitive validity verification of reading-to-write tasks in L2 Chinese writing assessment ↗
Abstract
Reading-to-write (RTW) tasks have been commonly employed in second language (L2) English academic writing pedagogy, and many studies have investigated the validity and reliability of RTW tasks in L2 English writing assessment. Meanwhile, few studies have examined the cognitive validity of RTW tasks, and the design and validation of such tasks in L2 Chinese academic writing assessment remain underexplored. This study develops a Chinese RTW task following a set of design criteria and procedures and evaluates its cognitive validity as an instrument of L2 Chinese academic writing assessment. The RTW task was administered to 15 undergraduate and 15 postgraduate L2 Chinese learners in an eye-tracking laboratory. Analyses of the task features and the eye-tracking and stimulated recall interview data suggested that the RTW task largely aligned with the characteristics of authentic tasks in real L2 Chinese academic writing contexts and elicited a representative range of cognitive processes in existing models of RTW cognitive processes. Many of these processes manifested in different ways between the two groups of participants at different L2 Chinese proficiency levels. Our findings have useful implications for understanding the cognitive validity of the RTW task in L2 Chinese writing assessment.
-
Genre pedagogy: A writing pedagogy to help L2 writing instructors enact their classroom writing assessment literacy and feedback literacy ↗
Abstract
As part of a larger case study, this single exploratory case study aims to explore the potential of genre-based pedagogy (GBP) to allow L2 writing instructors to enact their writing assessment literacy and feedback literacy. The findings demonstrate that GBP afforded the participating writing instructor of a genre-based EAP writing course to carry out effective writing classroom assessment practices and thus enact their2 writing assessment literacy and feedback literacy. GBP allowed effective writing classroom assessment practices such as diagnostic assessment and learner involvement in assessment. More specifically, genre exploration tasks led to diagnostic assessment and helped the instructor coordinate effective classroom discussions to elicit evidence of the students’ knowledge of the target genre that they would study. Second, students’ production of texts in target genres not only allowed the instructor to collect evidence of the students’ specific genre knowledge, but it also afforded learner involvement through self-reflection. The instructor could also efficiently interpret this evidence and provide formative feedback through pre-established genre specific assessment criteria.
January 2023
October 2022
-
The persuasive essays for rating, selecting, and understanding argumentative and discourse elements (PERSUADE) corpus 1.0 ↗
Abstract
This paper introduces the Persuasive Essays for Rating, Selecting, and Understanding Argumentative and Discourse Elements (PERSUADE) corpus.The PERSUADE corpus is large-scale corpus of writing with annotated discourse elements. The goal of the corpus is to spur the development of new, open-source scoring algorithms that identify discourse elements in argumentative writing to open new avenues for the development of automatic writing evaluation systems that focus more specifically on the semantic and organizational elements of student writing.
-
Validity evidences for scoring procedures of a writing assessment task. A case study on consistency, reliability, unidimensionality and prediction accuracy ↗
Abstract
Scoring is a fundamental step in the assessment of writing performance. The choice of the scoring procedure as well as the adoption of a discrepancy resolution method can impact the psychometric properties of the scores and therefore the final pass/fail decision. In a comprehensive framework which considers scoring as part of the validation process of the scores, the aim of this paper is to evaluate the impact of rater mean, parity and tertium quid procedures on score properties. Using data from a writing assessment task applied in a professional context, the paper analyses score reliability, dependability, unidimensionality and decision accuracy on two sets of data; complete data and subsample of discrepant data. The results show better performance of the tertium quid procedure in terms of reliability indicators but a lower quality in defining construct unidimensionality.
-
Abstract
Written assessment feedback in higher education has been examined from different perspectives. However, there is limited empirical evidence of how tutors use language to provide assessment feedback on students’ assessed academic writing. By deploying the rarely used Appraisal framework in Systemic Functional Linguistics, this innovative study examined the use of evaluative language by tutors in feedback on undergraduate business students’ academic writing in two assignments at a distance university. The data consisted of 16 tutor assessment feedback summaries on eight students’ written assignments and interviews with those students. The Appraisal system of Attitude (Judgement, Appreciation and Affect) was used to analyse the evaluative language of the summaries. The analysis of student interviews provided insights into their perceptions of tutor feedback, complementing the linguistic analysis. The findings suggest that tutors’ evaluative language was primarily used to judge students rather than to appreciate the assignment, and show their emotional reactions, potentially owing to the distance learning context. Additionally, while most of the feedback was perceived positively, students found certain types of tutor feedback less helpful. The paper has implications for moving assessment feedback research forward through applying the Appraisal framework, improving assessment strategies and tutor formative feedback practices in writing assessment.
-
Structure and coherence as challenges in composition: A study of assessing less proficient EFL writers’ text quality ↗
Abstract
Students are usually expected to write full texts in English as a foreign language (EFL) at the end of secondary education. However, research on EFL writing at school is scarce, especially regarding less proficient writers, and seldom focuses on deep-level text features such as structure and coherence. Based on a sample of 166 EFL students in Year 9 attending German middle and lower performance track schools, this study examined 326 narrative and argumentative texts. First, we assessed structure and coherence via analytic ratings using detailed rubrics to gain insights into possible challenges for students. Our analysis showed that relevant text parts (such as the conclusion) were mostly missing and that students struggled to establish a broad common thread with argumentative texts being overall less structured and coherent than narrative texts. Second, we used the software Comproved® to conduct holistic ratings of overall text quality and compared them with our analytic ratings. Large correlations between both ratings suggest that structure and coherence are important aspects of text quality. We discuss how our rubrics can serve as a useful tool for assessment for learning and assist less proficient writers in establishing deep-level features in their texts.
-
Abstract
Integrated tasks are increasing in popularity, either replacing or complementing writing-only independent tasks in writing assessments. This shift has generated many research interests to investigate the underlying construct and features of integrated writing (IW) performances. However, due to the complexity of the IW construct, there are conflicting findings about whether and the extent to which various language skills and IW text features correlate to IW scores. To understand the construct of IW, we conducted a meta-analysis to synthesize correlation coefficients between scores of IW performances and (1) other language skills and (2) text quality features of IW. We also examined factors that may moderate the correlation of IW scores with these two groups of correlates. Consequently, (1) reading and writing skills showed stronger correlations than listening to IW scores; and (2) text length had a strongest correlation, followed by source integration, organization and syntactic complexity, with a smallest correlation of lexical complexity. Several IW task features affected the magnitude of correlations. The results supported the view that IW is an independent construct, albeit related, from other language skills and IW task features may affect the construct of IW.
July 2022
April 2022
-
The mediating effects of student beliefs on engagement with written feedback in preparation for high-stakes English writing assessment ↗
Abstract
Research in L2 writing contexts has shown developing writers’ beliefs exert a powerful mediating effect on how they respond to written feedback. The mediating role of beliefs is magnified in preparation for high-stakes English writing assessment contexts, where tangible outcomes pivot on successful test performance. The present qualitative case study utilises data from semi-structured interviews to investigate how the beliefs of three self-directed IELTS preparation candidates mediated their affective, behavioural, and cognitive engagement with electronic teacher written feedback across three multi-draft Task 2 rehearsal essays. Utilising a metacognitive conceptual approach (Wenden, 1998), the study identified seven themes: 1) self-concept beliefs regulated engagement, 2) reliance on the expertise of a quality teacher, 3) engagement was mediated by individuals’ learning-to-write beliefs, 4) belief in comprehensive, critical written feedback, 5) feedback deemed transferable was more comprehensively engaged with, 6) entrenched test-taking strategy beliefs hindered engagement, and 7) supplementary self-directed learning activities were considered of limited value. The implications for practitioners of IELTS Writing preparation and the IELTS co-owners are discussed.