Linguistik-Klassifikation
Refine
Year of publication
Document Type
- Preprint (48)
- Conference Proceeding (19)
- Part of a Book (9)
- Article (5)
- Book (5)
- Working Paper (4)
Language
- English (90) (remove)
Has Fulltext
- yes (90) (remove)
Is part of the Bibliography
- no (90)
Keywords
- Computerlinguistik (33)
- Japanisch (15)
- Deutsch (13)
- Syntaktische Analyse (9)
- Maschinelle Übersetzung (8)
- Multicomponent Tree Adjoining Grammar (8)
- Lexicalized Tree Adjoining Grammar (5)
- Semantik (5)
- Satzanalyse (4)
- Transkription (4)
- German (3)
- Grammatik (3)
- Range Concatenation Grammar (3)
- Software (3)
- Syntax (3)
- Tree Adjoining Grammar (3)
- Dialog (2)
- Englisch (2)
- Frage (2)
- Grammaires d’Arbres Adjoints (2)
- Höflichkeitsform (2)
- Kongress (2)
- MCTAG (2)
- Numerale (2)
- Suchmaschine (2)
- Tree Adoining Grammar (2)
- Tree Description Grammar (2)
- speech tagging (2)
- Akustische Phonetik (1)
- Arabisch (1)
- Artikulatorische Phonetik (1)
- Auditive Phonetik (1)
- Automatentheorie (1)
- Automatische Spracherkennung (1)
- Benutzeroberfläche (1)
- Chinesisch (1)
- Computersimulation (1)
- Coreference annotation (1)
- Datenstruktur (1)
- Description Tree Grammar (1)
- Experimentelle Phonetik (1)
- Formale Sprache (1)
- Formalismes syntaxiques (1)
- Fremdsprache (1)
- Fuzzy-Logik (1)
- Generic NLP Architecture (1)
- Gesprochene Sprache (1)
- HPSG Parsing (1)
- IE (1)
- Korean (1)
- Koreanisch (1)
- Korpus <Linguistik> (1)
- LTAG (1)
- Lautsprache (1)
- Lexical Resource Semantics (1)
- Lexical Ressource Semantics (1)
- Mittelchinesisch (1)
- Morphologie (1)
- Numerus (1)
- Online-Publikation (1)
- Ontologie <Wissensverarbeitung> (1)
- Partikel (1)
- Präposition (1)
- Romanian (1)
- Rumänisch (1)
- SYNtax-based Reference Annotation (1)
- Satzanlyse (1)
- Shallow NLP (1)
- Simple Range Concatenation Grammar (1)
- Sloppiness (1)
- Spracherwerb (1)
- Syntactic formalisms (1)
- TUSNELDA (1)
- Tarragona <2008> (1)
- Tree-Adjoining Grammar (1)
- Tübingen <2007> (1)
- Unordered Vector Grammar with Dominance Link (1)
- Vagheit (1)
- Vagueness (1)
- Visualisierung (1)
- Word Sense Disambiguation (1)
- XML (1)
- allemand (1)
- brouillage d’arguments (1)
- chunk parsing (1)
- computational semantics (1)
- coréen (1)
- formalismes grammaticaux (1)
- german (1)
- grammaires d’arbres (1)
- grammar formalism (1)
- lexicalized tree-adjoining grammar (1)
- memory-based learning (1)
- metagrammars (1)
- multicomponent rewriting (1)
- métagrammaires (1)
- ordre des mots (1)
- quantifier scope (1)
- robust parsing (1)
- role labeling (1)
- scrambling (1)
- similarity-based learning (1)
- time annotation (1)
- tree-based grammars (1)
- treebanking (1)
- underspecification (1)
- word order (1)
Institute
- Extern (80)
In this paper we describe SOBA, a sub-component of the SmartWeb multi-modal dialog system. SOBA is a component for ontologybased information extraction from soccer web pages for automatic population of a knowledge base that can be used for domainspecific question answering. SOBA realizes a tight connection between the ontology, knowledge base and the information extraction component. The originality of SOBA is in the fact that it extracts information from heterogeneous sources such as tabular structures, text and image captions in a semantically integrated way. In particular, it stores extracted information in a knowledge base, and in turn uses the knowledge base to interpret and link newly extracted information with respect to already existing entities.
This demo abstract describes the SmartWeb Ontology-based Information Extraction System (SOBIE). A key feature of SOBIE is that all information is extracted and stored with respect to the SmartWeb ontology. In this way, other components of the systems, which use the same ontology, can access this information in a straightforward way. We will show how information extracted by SOBIE is visualized within its original context, thus enhancing the browsing experience of the end user.
In the last decade, the Penn treebank has become the standard data set for evaluating parsers. The fact that most parsers are solely evaluated on this specific data set leaves the question unanswered how much these results depend on the annotation scheme of the treebank. In this paper, we will investigate the influence which different decisions in the annotation schemes of treebanks have on parsing. The investigation uses the comparison of similar treebanks of German, NEGRA and TüBa-D/Z, which are subsequently modified to allow a comparison of the differences. The results show that deleted unary nodes and a flat phrase structure have a negative influence on parsing quality while a flat clause structure has a positive influence.
This paper develops a framework for TAG (Tree Adjoining Grammar) semantics that brings together ideas from different recent approaches.Then, within this framework, an analysis of scope is proposed that accounts for the different scopal properties of quantifiers, adverbs, raising verbs and attitude verbs. Finally, including situation variables in the semantics, different situation binding possibilities are derived for different types of quantificational elements.
This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogs, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper ´die tageszeitung´(taz). The approach can be used more generally as a means of distinguishing and classifying language corpora of different genres.
Multicomponent Tree Adjoining Grammars (MCTAG) is a formalism that has been shown to be useful for many natural language applications. The definition of MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. Looking only at the result of a derivation (i.e., the derived tree and the derivation tree), this simultaneity is no longer visible and therefore cannot be checked. I.e., this way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. Therefore, in this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees the MCTAG licences.
When a statistical parser is trained on one treebank, one usually tests it on another portion of the same treebank, partly due to the fact that a comparable annotation format is needed for testing. But the user of a parser may not be interested in parsing sentences from the same newspaper all over, or even wants syntactic annotations for a slightly different text type. Gildea (2001) for instance found that a parser trained on the WSJ portion of the Penn Treebank performs less well on the Brown corpus (the subset that is available in the PTB bracketing format) than a parser that has been trained only on the Brown corpus, although the latter one has only half as many sentences as the former. Additionally, a parser trained on both the WSJ and Brown corpora performs less well on the Brown corpus than on the WSJ one. This leads us to the following questions that we would like to address in this paper: - Is there a difference in usefulness of techniques that are used to improve parser performance between the same-corpus and the different-corpus case? - Are different types of parsers (rule-based and statistical) equally sensitive to corpus variation? To achieve this, we compared the quality of the parses of a hand-crafted constraint-based parser and a statistical PCFG-based parser that was trained on a treebank of German newspaper text.
This paper proposes an annotating scheme that encodes honorifics (respectful words). Honorifics are used extensively in Japanese, reflecting the social relationship (e.g. social ranks and age) of the referents. This referential information is vital for resolving zero
pronouns and improving machine translation outputs. Annotating honorifics is a complex task that involves identifying a predicate with honorifics, assigning ranks to referents of the
predicate, calibrating the ranks, and connecting referents with their predicates.
While the sortal constraints associated with Japanese numeral classifiers are well-studied, less attention has been paid to the details of their syntax. We describe an analysis implemented within a broad-coverage HPSG that handles an intricate set of numeral classifier construction types and compositionally relates each to an appropriate semantic representation, using Minimal Recursion Semantics.
The Deep Linguistic Processing with HPSG Initiative (DELH-IN) provides the infrastructure needed to produce open-source semantic transfer-based machine translation systems. We have made available a prototype Japanese-English machine translation system built from existing resources include parsers, generators, bidirectional grammars and a transfer engine.