Refine
Year of publication
Document Type
- Article (16)
- Working Paper (5)
- Preprint (2)
- Conference Proceeding (1)
Has Fulltext
- yes (24)
Is part of the Bibliography
- no (24)
Keywords
- Sprachtypologie (2)
- Aortic valve (1)
- Aortic valve replacement (1)
- Back scan (1)
- C. elegans (1)
- Computerlinguistik (1)
- Custom-made mouthguard (1)
- Deutsch (1)
- Discs large (1)
- Electron-pion identification (1)
Institute
- Medizin (11)
- Physik (4)
- Frankfurt Institute for Advanced Studies (FIAS) (3)
- Informatik (3)
- Geowissenschaften (2)
- Extern (1)
- Fachübergreifende Einrichtungen (1)
- Georg-Speyer-Haus (1)
We present an effort for the development of multilingual named entity grammars in a unification-based finite-state formalism (SProUT). Following an extended version of the MUC7 standard, we have developed Named Entity Recognition grammars for German, Chinese, Japanese, French, Spanish, English, and Czech. The grammars recognize person names, organizations, geographical locations, currency, time and date expressions. Subgrammars and gazetteers are shared as much as possible for the grammars of the different languages. Multilingual corpora from the business domain are used for grammar development and evaluation. The annotation format (named entity and other linguistic information) is described. We present an evaluation tool which provides detailed statistics and diagnostics, allows for partial matching of annotations, and supports user-defined mappings between different annotation and grammar output formats.
At the end of last year, I designed an inquiry about the present state of linguistic typology in the form of a questionnaire. It was an attempt to cover the whole field by formulating the questions which seemed most relevant to it. This questionnaire is reproduced, without modifications, following this preface. In the first days of this year, it was sent to 33 linguists who I know are working in the field. The purpose was to form, on the basis of responses received, a picture of convergences and divergences among trends of present-day linguistic typology. The idea was also to get an objective basis for my report on "The present state of linguistic typology", to be delivered at the XIII. International Congress of Linguistics at Tokyo, 1982.
I shall use the precise term 'interlinear morphemic translation (IMT) to designate the object of this study. [...] An IMT is a translation of a text in a language L1 to a string of elements taken from L2 where, ideally, each morpheme of the L1 text is rendered by a morpheme of L2 or a configuration of symbols representing its meaning and where the sequence of the units of the translation corresponds to the sequence of the morphemes which they render. [...] An IMT is needed whenever it is essential that the reader grasp the grammatical structure of the L1 text but is presumed to be so unfamiliar with L1 that he will not be able to do so merely with the aid of a normal translation and the context in which the text is cited. [...] The primary aim of an IMT is to make the grammatical structure of the L1 text transparent. The textual fluency of the IMT by standards of the L2 grammar is a subordinate aim at best.
Recent developments in typology which put the notions of linguistic function and operation into the focus of interest and establish them as the ultimate base on which languages are comparable prove fruitful for contrastive linguistics. The functional approach is illustrated in a contrastive analysis of Persian and German relative clauses. In a sketch of the theory of the relative clause, four grammatical functions to be fulfilled by relative constructions are deduced, and the two languages are compared with respect to the various ways in which they realize them. Learning problems can thus be predicted with greater confidence, be explained more satisfactorily, and be remedied more efficiently, because they are seen as learner's attempts to transfer, beside the underlying functions and operations, which the languages do have in common, the techniques of their realization, which they do not have in common.