A Text Analytic Approach to Classifying Document Types
Abstract
Background: While it is commonly recognized that almost every work and research discipline utilize their own taxonomy, the language used within a specific discipline may also vary depending on numerous factors, including the desired effect of the information being communicated and the intended audience. Different audiences are reached through publication of information, including research results, in different types of publication outlets such as newspapers, newsletters, magazines, websites, and journals. Prior research has shown that students, both undergraduate and graduate, as well as faculty may have a difficult time locating information in different publication outlet types (e.g., magazines, newspapers, journals). The type of publication may affect the ease of understanding and also the confidence placed in the acquired information. A text analytics tool for classifying the source of research as a newsletter (used as a substitute for newspaper articles), a magazine, or an academic journal article has been developed to assist students, faculty, and researchers in identifying the likely source type of information and classifying their own writings with respect to these possible publication outlet types. Literature Review: Literature on information literacy is discussed as this forms the motivation for the reported research. Additionally, prior research on using text mining and text analytics is examined to better understand the methodology employed, including a review of the original Scale of Theoretical and Applied Research system, adapted for the current research. Research Questions: The primary research question is: Can a text mining and text analytics approach accurately determine the most probable publication source type with respect to being from a newsletter, magazine, or journal? Methodology: A text mining and text analytics algorithm, STAR’ (System for Text Analytics-based Ranking), was developed from a previously researched text mining tool, STAR (Scale of Theoretical and Applied Research), that was used to classify the research type of articles between theoretical and applied research. The new text mining method, STAR’, analyzes the language used in manuscripts to determine the type of publication. This method first mines all words from corresponding publication source types to determine a keyword corpus. The corpus is then used in a text analytics process to classify full newsletters, magazine articles, and journal articles with respect to their publication source. All newsletters, magazine articles, and journal articles are from the library and information sciences (LIS) domain. Results: The STAR’ text analytics method was evaluated as a proof of concept on a specific LIS organizational newsletter, as well as articles from a single LIS magazine and a single LIS journal. STAR’ was able to classify the newsletters, magazine articles, and journal articles with 100% accuracy. Random samples from another similar LIS newsletter and a different LIS journal were also evaluated to examine the robustness of the STAR’ method in the initial proof of concept. Following the positive results of the proof of concept, additional journal, magazine, and newsletter articles were used to evaluate the generalizability of STAR’. The second-round results were very positive for differentiating journals and newsletters from other publication types, but revealed potential issues for distinguishing magazine articles from other types of publications. Discussion: STAR’ demonstrates that the language used for transferring information within a specific discipline does differ significantly depending on the intended recipients of the research knowledge. Further work is needed to examine language usage specific to magazine articles. Conclusions: The STAR’ method may be used by students and faculty to identify the likely source of research or discipline-specific information. This may improve trust in the reliability of information due to different levels of rigor applied to different types of publications. Additionally, the STAR’ classifications may be used by students, faculty, or researchers to determine the most appropriate type of outlet and correspondingly the most appropriate type of audience for the reported information in their own manuscripts, thereby improving the chance for successful sharing of information to appropriate audiences who will deem the information to be reliable, through publication in the most relevant outlet type.
- Journal
- Journal of Writing Analytics
- Published
- 2017-01-01
- DOI
- 10.37514/jwa-j.2017.1.1.06
- CompPile
- Open Access
- OA PDF Gold
- Topics
- Export
- BibTeX RIS
Citation Context
Cited by in this index (0)
No articles in this index cite this work.
References (0)
No references on file for this article.
Related Articles
-
Pedagogy Oct 2023rhetorical criticism first-year composition writing pedagogy writing across the curriculum two-year college teacher development collaborative writing assessment writing centers qualitative research multimodality literacy studies race and writing gender and writing disability studies affect and writing literary studies book reviews editorial matter
-
Literacy in Composition Studies Mar 2022Ruben “Ruby” Mendoza
-
College Composition and Communication Sep 2005Michael Bernard-Donals
-
College Composition and Communication Sep 2025Stuart A. Selber
-
Communication Design Quarterly Sep 2025Review of "Environmental Preservation and the Grey Cliffs Conflict: Negotiating Common Narratives, Values, and Ethos by Kristin D. Pickering," Pickering, K. D. (2024). Environmental preservation and the grey cliffs Conflict: Negotiating common narratives, values, and ethos. Utah State University Press. ↗Phillip Lovas