• search hit 3 of 29
Back to Result List

Section-Type constraints on the choice of linguistic mechanisms in research articles: a corpus-based approach

  • This thesis investigates the structure of research articles in the field of Computational Linguistics with the goal of establishing that a set of distinctive linguistic features is associated with each section type. The empirical results of the study are derived from the quantitative and qualitative evaluation of research articles from the ACL Anthology Corpus. More than 20,000 articles were analyzed for the purpose of retrieving the target section types and extracting the predefined set of linguistic features from them. Approximately 1,100 articles were found to contain all of the following five section types: abstract, introduction, related work, discussion, and conclusion. These were chosen for the purpose of comparing the frequency of occurrence of the linguistic features across the section types. Making use of frameworks for Natural Language Processing, the Stanford CoreNLP Module, and the Python library SpaCy, as well as scripts created by the author, the frequency scores of the features were retrieved and analyzed with state-of-the-art statistical techniques. The results show that each section type possesses an individual profile of linguistic features which are associated with it more or less strongly. These section-feature associations are shown to be derivable from the hypothesized purpose of each section type. Overall, the findings reported in this thesis provide insights into the writing strategies that authors employ so that the overall goal of the research paper is achieved. The results of the thesis can find implementation in new state-of-the-art applications that assist academic writing and its evaluation in a way that provides the user with a more sophisticated, empirically based feedback on the relationship between linguistic mechanisms and text type. In addition, the potential of the identification of text-type specific linguistic characteristics (a text-feature mapping) can contribute to the development of more robust language-based models for disinformation detection.

Download full text files

Export metadata

Metadaten
Author:Iverina IvanovaGND
URN:urn:nbn:de:hebis:30:3-743881
DOI:https://doi.org/10.21248/gups.74388
Place of publication:Frankfurt am Main
Referee:Gert WebelhuthORCiDGND, Frank RichterORCiDGND, Manfred SailerORCiDGND
Document Type:Doctoral Thesis
Language:English
Date of Publication (online):2023/06/29
Year of first Publication:2021
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Granting Institution:Johann Wolfgang Goethe-Universität
Date of final exam:2021/04/28
Release Date:2023/06/29
Tag:Question Under Discussion; academic writing; discourse structure; distinctive linguistic features; section-feature mapping
Page Number:145
Note:
The datasets for this thesis are available on https://doi.org/10.25716/gude.1jnt-32xh
HeBIS-PPN:509170897
Institutes:Neuere Philologien
Dewey Decimal Classification:4 Sprache / 40 Sprache / 400 Sprache
Sammlungen:Universitätspublikationen
Licence (German):License LogoCreative Commons - CC BY - Namensnennung 4.0 International