Refine
Year of publication
- 2000 (4) (remove)
Document Type
- Preprint (4) (remove)
Has Fulltext
- yes (4)
Is part of the Bibliography
- no (4) (remove)
Keywords
- Automatische Sprachanalyse (1)
- Deutsch (1)
- Fehlererkennung (1)
- Fehlertoleranz (1)
- German (1)
- Ilmenau <2000> (1)
- Kongress (1)
- Natürlichsprachiges System (1)
- Robustheit (1)
- Sprachverarbeitung (1)
Institute
- Extern (4) (remove)
Das Chunkparsing bietet einen besonders vielversprechenden Ansatz zum robusten, partiellen Parsing mit dem Ziel einer breiten Datenabdeckung. Ziel beim Chunkparsing ist eine partielle, nicht-rekursive syntaktische Struktur. Dieser extrem effiziente Parsing-Ansatz läßt sich als Kaskade endlicher Transducer realisieren. In diesem Beitrag wird TüSBL vorgestellt, ein System, bei dem die Eingabe aus spontaner, gesprochener Spache besteht, die dem Parser in Form eines Worthypothesengraphen aus einem Spracherkenner zur Verfügung gestellt wird. Chunkparsing ist für eine solche Anwendung besonders geeignet, da es fragmentarische oder nicht wohlgeformte Äußerungen robust behandeln kann. Des weiteren wird eine Baumkonstruktionskomponente vorgestellt, die die partiellen Chunkstrukturen zu vollständigen Bäumen mit grammatischen Funktionen erweitert. Das System wird anhand manuell überprüfter Systemeingaben evaluiert, da sich die üblichen Evaluationsparameter hierfür nicht eignen.
In this paper, we investigate the role of sub-optimality in training data for part-of-speech tagging. In particular, we examine to what extent the size of the training corpus and certain types of errors in it affect the performance of the tagger. We distinguish four types of errors: If a word is assigned a wrong tag, this tag can belong to the ambiguity class of the word (i.e. to the set of possible tags for that word) or not; furthermore, the major syntactic category (e.g. "N" or "V") can be correctly assigned (e.g. if a finite verb is classified as an infinitive) or not (e.g. if a verb is classified as a noun). We empirically explore the decrease of performance that each of these error types causes for different sizes of the training set. Our results show that those types of errors that are easier to eliminate have a particularly negative effect on the performance. Thus, it is worthwhile concentrating on the elimination of these types of errors, especially if the training corpus is large.
This paper proposes a corpus encoding standard that meets the needs of linguistic research using a variety of linguistic data structures. The standard was developed in SFB 441, a research project at the University of Tuebingen. The principal concern of SFB 441 are the empirical data structures which feed into linguistic theory building. SFB 441 consists of several projects, most of which are building corpora to empirically investigate various linguistic phenomena in various languages (e.g. modal verbs in German, forms of address and politeness in Russian). These corpora will form the components of the "Tuebingen collection of reusable, empirical, linguistic data structures (TUSNELDA)". The TUSNELDA annotation standard aims at providing a uniform encoding scheme for all subcorpora and texts of TUSNELDA such that they can be processed with uniform standardized tools. To guarantee maximal reusability we use XML for encoding. Previous SGML standards for text encoding were provided by the Text Encoding Initiative (TEI) and the Expert Advisory Group on Language Engineering Standards (Corpus Encoding Standard, CES). The TUSNELDA standard is based on TEI and XCES (XML version of CES) but takes into account the specific needs of the SFB projects, i.e. the peculiarities of the examined languages and linguistic phenomena.
Existing analyses of German scrambling phenomena within TAG-related formalisms all use non-local variants of TAG. However, there are good reasons to prefer local grammars, in particular with respect to the use of the derivation structure for semantics. Therefore this paper proposes to use local TDGs, a TAG-variant generating tree descriptions that shows a local derivation structure. However the construction of minimal trees for the derived tree descriptions is not subject to any locality constraint. This provides just the amount of non-locality needed for an adequate analysis of scrambling. To illustrate this a local TDG for some German scrambling data is presented.