TüSBL : a similarity-based chunk parser for robust syntactic processing

  • Chunk parsing has focused on the recognition of partial constituent structures at the level of individual chunks. Little attention has been paid to the question of how such partial analyses can be combined into larger structures for complete utterances. The TüSBL parser extends current chunk parsing techniques by a tree-construction component that extends partial chunk parses to complete tree structures including recursive phrase structure as well as function-argument structure. TüSBLs tree construction algorithm relies on techniques from memory-based learning that allow similarity-based classification of a given input structure relative to a pre-stored set of tree instances from a fully annotated treebank. A quantitative evaluation of TüSBL has been conducted using a semi-automatically constructed treebank of German that consists of appr. 67,000 fully annotated sentences. The basic PARSEVAL measures were used although they were developed for parsers that have as their main goal a complete analysis that spans the entire input.This runs counter to the basic philosophy underlying TüSBL, which has as its main goal robustness of partially analyzed structures.

Download full text files

Export metadata

Metadaten
Author:Sandra KüblerORCiDGND, Erhard Hinrichs
URN:urn:nbn:de:hebis:30-1110508
URL:http://cl.indiana.edu/~skuebler/papers/hlt01.ps
Editor:Morgan Kaufmann
Document Type:Preprint
Language:English
Year of Completion:2001
Year of first Publication:2001
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Release Date:2008/10/21
Tag:chunk parsing; robust parsing; similarity-based learning
GND Keyword:Satzanalyse
Page Number:6
Note:
Erschienen in: Morgan Kaufmann (Hrsg.): Proceedings of the First International Conference on Human Language Technology Research, HLT 2001, San Diego, California, USA, March 18-21, 2001
Source:http://jones.ling.indiana.edu/~skuebler/papers/hlt01.ps ; Proceedings of HLT 2001, (San Diego, California 2001).
HeBIS-PPN:206753691
Institutes:keine Angabe Fachbereich / Extern
Dewey Decimal Classification:4 Sprache / 40 Sprache / 400 Sprache
Sammlungen:Linguistik
Linguistik-Klassifikation:Linguistik-Klassifikation: Computerlinguistik / Computational linguistics
Licence (German):License LogoDeutsches Urheberrecht