OPUS 4 | Search

Improving data quality by rules: a numismatic example (2017)

The archaeological data dealt with in our database solution Antike Fundmünzen in Europa (AFE), which records finds of ancient coins, is entered by humans. Based on the Linked Open Data (LOD) approach, we link our data to Nomisma.org concepts, as well as to other resources like Online Coins of the Roman Empire (OCRE). Since information such as denomination, material, etc. is recorded for each single coin, this information should be identical for coins of the same type. Unfortunately, this is not always the case, mostly due to human errors. Based on rules that we implemented, we were able to make use of this redundant information in order to detect possible errors within AFE, and were even able to correct errors in Nomimsa.org. However, the approach had the weakness that it was necessary to transform the data into an internal data model. In a second step, we therefore developed our rules within the Linked Open Data world. The rules can now be applied to datasets following the Nomisma. org modelling approach, as we demonstrated with data held by Corpus Nummorum Thracorum (CNT). We believe that the use of methods like this to increase the data quality of individual databases, as well as across different data sources and up to the higher levels of OCRE and Nomisma.org, is mandatory in order to increase trust in them.

Resolving anaphoric references on deficient syntactic descriptions (1997)

Stuckardt, Roland

Syntactic coindexing restrictions are by now known to be of central importance to practical anaphor resolution approaches. Since, in particular due to structural ambiguity, the assumption of the availability of a unique syntactic reading proves to be unrealistic, robust anaphor resolution relies on techniques to overcome this deficiency. In this paper, two approaches are presented which generalize the verification of coindexing constraints to de cient descriptions. At first, a partly heuristic method is described, which has been implemented. Secondly, a provable complete method is specified. It provides the means to exploit the results of anaphor resolution for a further structural disambiguation. By rendering possible a parallel processing model, this method exhibits, in a general sense, a higher degree of robustness. As a practically optimal solution, a combination of the two approaches is suggested.

Anaphor resolution and the scope of syntactic constraints (1996)

Stuckardt, Roland

An anaphor resolution algorithm is presented which relies on a combination of strategies for narrowing down and selecting from antecedent sets for re exive pronouns, nonre exive pronouns, and common nouns. The work focuses on syntactic restrictions which are derived from Chomsky's Binding Theory. It is discussed how these constraints can be incorporated adequately in an anaphor resolution algorithm. Moreover, by showing that pragmatic inferences may be necessary, the limits of syntactic restrictions are elucidated.

Können Computer Texte verstehen? Maschinelle Sprachverarbeitung aus den Perspektiven von Philosophischer Hermeneutik, Kognitionswissenschaften und Künstlicher Intelligenz (1999)

Stuckardt, Roland

Vortragsfolien: Vortragsreihe "Unplugged Heads", Institut für Neue Medien, Frankfurt am Main, 11. Mai 1999.

Coreference-Based Summarization and Question Answering: a Case for High Precision Anaphor Resolution (2003)

Stuckardt, Roland

Approaches to Text Summarization and Question Answering are known to benefit from the availability of coreference information. Based on an analysis of its contributions, a more detailed look at coreference processing for these applications will be proposed: it should be considered as a task of anaphor resolution rather than coreference resolution. It will be further argued that high precision approaches to anaphor resolution optimally match the specific requirements. Three such approaches will be described and empirically evaluated, and the implications for Text Summarization and Question Answering will be discussed.

Machine-learning-based vs. manually designed approaches to anaphor resolution: the best of two worlds (2002)

Stuckardt, Roland

In the last years, much effort went into the design of robust anaphor resolution algorithms. Many algorithms are based on antecedent filtering and preference strategies that are manually designed. Along a different line of research, corpus-based approaches have been investigated that employ machine-learning techniques for deriving strategies automatically. Since the knowledge-engineering effort for designing and optimizing the strategies is reduced, the latter approaches are considered particularly attractive. Since, however, the hand-coding of robust antecedent filtering strategies such as syntactic disjoint reference and agreement in person, number, and gender constitutes a once-for-all effort, the question arises whether at all they should be derived automatically. In this paper, it is investigated what might be gained by combining the best of two worlds: designing the universally valid antecedent filtering strategies manually, in a once-for-all fashion, and deriving the (potentially genre-specific) antecedent selection strategies automatically by applying machine-learning techniques. An anaphor resolution system ROSANA-ML, which follows this paradigm, is designed and implemented. Through a series of formal evaluations, it is shown that, while exhibiting additional advantages, ROSANAML reaches a performance level that compares with the performance of its manually designed ancestor ROSANA.

Three Algorithms for Competence-Oriented Anaphor Resolution (2004)

Stuckardt, Roland

In the last decade, much effort went into the design of robust third-person pronominal anaphor resolution algorithms. Typical approaches are reported to achieve an accuracy of 60-85%. Recent research addresses the question of how to deal with the remaining difficult-toresolve anaphors. Lappin (2004) proposes a sequenced model of anaphor resolution according to which a cascade of processing modules employing knowledge and inferencing techniques of increasing complexity should be applied. The individual modules should only deal with and, hence, recognize the subset of anaphors for which they are competent. It will be shown that the problem of focusing on the competence cases is equivalent to the problem of giving precision precedence over recall. Three systems for high precision robust knowledge-poor anaphor resolution will be designed and compared: a ruleset-based approach, a salience threshold approach, and a machine-learning-based approach. According to corpus-based evaluation, there is no unique best approach. Which approach scores highest depends upon type of pronominal anaphor as well as upon text genre.

Lower bounds for multi-pass processing of multiple data streams (2009)

Schweikardt, Nicole

This paper gives a brief overview of computation models for data stream processing, and it introduces a new model for multi-pass processing of multiple streams, the so-called mp2s-automata. Two algorithms for solving the set disjointness problem with these automata are presented. The main technical contribution of this paper is the proof of a lower bound on the size of memory and the number of heads that are required for solving the set disjointness problem with mp2s-automata.

The process complexity and effective random tests (1972)

Schnorr, Claus Peter

We propose a variant of the Kolmogorov concept of complexity which yields a common theory of finite and infinite random sequences. The process complexity does not oscillate. We establish some concepts of effective tests which are proved to be equivalent.

Lattice reduction by random sampling and birthday methods (2003)

Schnorr, Claus Peter

We present a novel practical algorithm that given a lattice basis b1, ..., bn finds in O(n exp 2 *(k/6) exp (k/4)) average time a shorter vector than b1 provided that b1 is (k/6) exp (n/(2k)) times longer than the length of the shortest, nonzero lattice vector. We assume that the given basis b1, ..., bn has an orthogonal basis that is typical for worst case lattice bases. The new reduction method samples short lattice vectors in high dimensional sublattices, it advances in sporadic big jumps. It decreases the approximation factor achievable in a given time by known methods to less than its fourth-th root. We further speed up the new method by the simple and the general birthday method. n2

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

41 search hits