Linguistik-Klassifikation
Refine
Year of publication
Document Type
- Preprint (53) (remove)
Has Fulltext
- yes (53)
Is part of the Bibliography
- no (53)
Keywords
- Deutsch (11)
- Multicomponent Tree Adjoining Grammar (8)
- Syntaktische Analyse (8)
- Lexicalized Tree Adjoining Grammar (5)
- Semantik (4)
- German (3)
- Kongress (3)
- Range Concatenation Grammar (3)
- Satzanalyse (3)
- Syntax (3)
Institute
- Extern (49)
This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogs, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper ´die tageszeitung´(taz). The approach can be used more generally as a means of distinguishing and classifying language corpora of different genres.
Recent approaches to Word Sense Disambiguation (WSD) generally fall into two classes: (1) information-intensive approaches and (2) information-poor approaches. Our hypothesis is that for memory-based learning (MBL), a reduced amount of data is more beneficial than the full range of features used in the past. Our experiments show that MBL combined with a restricted set of features and a feature selection method that minimizes the feature set leads to competitive results, outperforming all systems that participated in the SENSEVAL-3 competition on the Romanian data. Thus, with this specific method, a tightly controlled feature set improves the accuracy of the classifier, reaching 74.0% in the fine-grained and 78.7% in the coarse-grained evaluation.