Treebank profiling of spoken and written German

  • This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogs, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper ´die tageszeitung´(taz). The approach can be used more generally as a means of distinguishing and classifying language corpora of different genres.
Author:Erhard Hinrichs, Sandra KüblerORCiDGND
Document Type:Preprint
Year of Completion:2005
Year of first Publication:2005
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Release Date:2008/11/03
Page Number:12
Erschienen in: Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories (TLT). Barcelona, Spain, December 2005, S. 65-76
Source:http://jones.ling.indiana.edu/~skuebler/papers/GermanEstimation.pdf ; Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories - Barcelona, Spain.
