Bochumer linguistische Arbeitsberichte : BLA
Hrsg.: Stefanie Dipper ; Björn Rothstein
Filtern
Erscheinungsjahr
- 2016 (3) (entfernen)
Dokumenttyp
- Arbeitspapier (3)
Volltext vorhanden
- ja (3)
Gehört zur Bibliographie
- nein (3)
17
NLP4CMC III : 3rd workshop on natural language processing for computer-mediated communication
(2016)
The present paper reports the first results of the compilation and annotation of a blog corpus for German. The main aim of the project is the representation of the blog discourse structure and relations between its elements (blog posts, comments) and participants (bloggers, commentators). The data included in the corpus were manually collected from the scientific blog portal SciLogs. The feature catalogue for the corpus annotation includes three types of information which is directly or indirectly provided in the blog or can be construed by means of statistical analysis or computational tools. At this point, only directly available information (e.g., title of the blog post, name of the blogger etc.) has been annotated. We believe, our blog corpus can be of interest for the general study of blog structure or related research questions as well as for the development of NLP methods and techniques (e.g. for authorship detection).
18
The Shared Task on Source and Target Extraction from Political Speeches (STEPS) first ran in 2014 and is organized by the Interest Group on German Sentiment Analysis (IGGSA). This volume presents the proceedings of the workshop of the second iteration of the shared task. The workshop was held at KONVENS 2016 at Ruhr-University Bochum on September 22, 2016.
As in the first edition of the shared task the main focus of STEPS was on fine-grained sentiment analysis and offered a full task as well as two subtasks for the extraction Subjective Expressions and/or their respective Sources and Targets.
In order to make the task more accessible, the annotation schema was revised for this year’s edition and an adjudicated gold standard was used for the evaluation. In contrast to the pilot task, this iteration provided training data for the participants, opening the Shared Task for systems based on machine learning approaches.
The gold standard1 as well as the evaluation tool2 have been made publicly available to the research community via the STEPS’ website.
We would like to thank the GSCL for their financial support in annotating the 2014 test data, which were available as training data in this iteration. A special thanks also goes to Stephanie Köser for her support on preparing and carrying out the annotation of this year’s test data. Finally, we would like to thank all the participants for their contributions and discussions at the workshop.