Vivekanandan Kumar
2 articles-
Abstract
Background: A shift of focus has been marked in recent years in the development of automated essay scoring systems (AES) passing from merely assigning a holistic score to an essay to providing constructive feedback over it. Despite all the major advances in the domain, many objections persist concerning their credibility and readiness to replace human scoring in high-stakes writing assessments. The purpose of this study is to shed light on how to build a relatively simple AES system based on five baseline writing features. The study shows that the proposed AES system compares very well with other state-of-the-art systems despite its obvious limitations. Literature Review: In 2012, ASAP (Automated Student Assessment Prize) launched a demonstration to benchmark the performance of state-of-the-art AES systems using eight hand-graded essay datasets originating from state writing assessments. These datasets are still used today to measure the accuracy of new AES systems. Recently, Zupanc and Bosnic (2017) developed and evaluated another state-of-the-art AES system, called SAGE, which enclosed new semantic and consistency features and provided for the first time an automatic semantic feedback. SAGE’s agreement level between machine and human scores for ASAP dataset #8 (the dataset also of interest in this study) was measured and had a quadratic weighted kappa of 0.81, while it ranged for 10 other state-of-the-art systems between 0.60 and 0.73 (Chen et al., 2012; Shermis, 2014). Finally, this section discusses the limitations of AES, which come mainly from its omission to assess higher-order thinking skills that all writing constructs are ultimately designed to assess. Research Questions: The research questions that guide this study are as follows: RQ1: What is the power of the writing analytics tool’s five-variable model (spelling accuracy, grammatical accuracy, semantic similarity, connectivity, lexical diversity) to predict the holistic scores of Grade 10 narrative essays (ASAP dataset #8)? RQ2: What is the agreement level between the computer rater based on the regression model obtained in RQ1 and the human raters who scored the 723 narrative essays written by Grade 10 students (ASAP dataset #8)? Methodology: ASAP dataset #8 was used to train the predictive model of the writing analytics tool introduced in this study. Each essay was graded by two teachers. In case of disagreement between the two raters, the scoring was resolved by a third rater. Basically, essay scores were the weighted sums of four rubric scores. A multiple linear regression analysis was conducted to determine the extent to which a five-variable model (selected from a set of 86 writing features) was effective to predict essay scores. Results: The regression model in this study accounted for 57% of the essay score variability. The correlation (Pearson), the percentage of perfect matches, the percentage of adjacent matches (±2), and the quadratic weighted kappa between the resolved scores and predicted essay scores were 0.76, 10%, 49%, and 0.73, respectively. The results were measured on an integer scale of resolved essay scores between 10-60. Discussion: When measuring the accuracy of an AES system, it is important to take into account several metrics to better understand how predicted essay scores are distributed along the distribution of human scores. Using average ranking over correlation, exact/adjacent agreement, quadratic weighted kappa, and distributional characteristics such as standard deviation and mean, this study’s regression model ranks 4th out of 10 AES systems. Despite its relatively good rank, the predictions of the proposed AES system remain imprecise and do not even look optimal to identify poor-quality essays (binary condition) smaller than or equal to a 65% threshold (71% precision and 92% recall). Conclusions: This study sheds light on the implementation process and the evaluation of a new simple AES system comparable to the state of the art and reveals that the generally obscure state-of-the-art AES system is most likely concerned only with shallow assessment of text production features. Consequently, the authors advocate greater transparency in the development and publication of AES systems. In addition, the relationship between the explanation of essay score variability and the inter-rater agreement level should be further investigated to better represent the changes in terms of level of agreement when a new variable is added to a regression model. This study should also be replicated at a larger scale in several different writing settings for more robust results.
-
Measuring the Written Language Disorder among Students with Attention Deficit Hyperactivity Disorder ↗
Abstract
Background: Attention Deficit Hyperactivity Disorder (ADHD) is a mental health disorder. People diagnosed with ADHD are often inattentive (have difficulty focusing on a task for a considerable period), overly impulsive (make rash decisions), and are hyperactive (move excessively, often at inappropriate times). ADHD is often diagnosed through psychiatric assessments with additional input from physical/neurological evaluations. Written Language Disorder (WLD) is a learning disorder. People diagnosed with WLD often make multiple spelling, grammar, and punctuation mistakes, have sentences that lack cohesion and topic flow, and have trouble completing written assignments. Typically, WLD is also diagnosed through psychological educational assessments with additional input from physical/neurological evaluation. Literature Review: Previous research has shown a link between ADHD and writing difficulties. Students with ADHD have an increased likelihood of having writing difficulties, and rarely is there a presence of writing difficulties without ADHD or another mental health disorder. However, the presence of writing difficulties does not necessarily indicate the presence of a WLD. There are other physical and behavioral factors of ADHD that can contribute to a student having a WLD as well. Therefore, a statistical association between these factors (in conjunction with written performance) and WLD must first be established. Research Question: To determine the statistical association between WLD and physical and behavioral aspects of ADHD that indicate writing difficulties, this research reviewed methodologies from the literature pertaining to contemporary diagnoses of writing difficulties in ADHD students, and reveal diagnostic methods that explicitly associate the presence of WLD with these writing difficulties among students with ADHD. The results demonstrate the association between writing difficulties and WLD as it pertains to ADHD students using an integrated computational model employed on data from a systematic review. These results will be validated in a future study that will employ the integrated computational model to measure WLD among students with ADHD. Methodology: To measure the association of WLD among students with ADHD, the authors created a novel computational model that integrates the outcomes of common screening methods for WLD (physical questionnaire, behavioral questionnaire, and written performance tasks) with common screening methods for ADHD (physical questionnaire, behavioral questionnaire, adult self-reporting scales, and reaction-based continuous performance tasks (CPTs)). The outcomes of these screening methods were fed into an artificial neural network (ANN ) first, to ‘artificially learn’ about measuring the prevalence of WLD among ADHD students and second, to adjust the prevalence value based on information from different screening methods. This can be considered as the priming of the ANN. The ANN model was then tested with data from previous studies about ADHD students who had writing difficulties. The ANN model was also tested with data from students without ADHD or WLD, to serve as control. Results: The results show that physical, behavioral, and written performance attributes of ADHD students have a high correlation with WLD (r = 0.72 to 0.80) in comparison to control students (r = 0.30 to 0.20), substantiating the link between WLD and ADHD. It should be noted that due to lack of female participation, most studies in the literature only employed and reported on the relationship between WLD and ADHD for male participants. Discussion and Conclusion: By testing ADHD students and control students against the WLD criteria, the study shows a strong correlation between WLD and ADHD. There are limitations to the results’ accuracy in terms of a) sample size (average n=88, mean age = 19, 8 studies used for a meta-analysis), b) analysis (original study reviewing ADHD factors first, WLD factors second), and c) causation (the study only reviews prevalence of WLD in ADHD students, not causation). A clinical trial will validate the data and address some of these limitations in a future phase of the research. A computational causal model will be introduced in the discussion portion to illustrate how causation between writing metrics and WLD as it pertains to ADHD can be achieved. These results open the door to advancing pedagogical techniques in education, where students afflicted with ADHD and/or WLD could not only receive assistance for the behavioral aspects of their disorder, but also expect assistance for the learning aspects of their disorder, empowering them to succeed in their studies.