Refine
Year of publication
Document Type
- Conference Proceeding (41) (remove)
Has Fulltext
- yes (41)
Is part of the Bibliography
- no (41) (remove)
Keywords
Institute
- Informatik (41) (remove)
The Specialized Information Service Biodiversity Research (BIOfid) has been launched to mobilize valuable biological data from printed literature hidden in German libraries for over the past 250 years. In this project, we annotate German texts converted by OCR from historical scientific literature on the biodiversity of plants, birds, moths and butterflies. Our work enables the automatic extraction of biological information previously buried in the mass of papers and volumes. For this purpose, we generated training data for the tasks of Named Entity Recognition (NER) and Taxa Recognition (TR) in biological documents. We use this data to train a number of leading machine learning tools and create a gold standard for TR in biodiversity literature. More specifically, we perform a practical analysis of our newly generated BIOfid dataset through various downstream-task evaluations and establish a new state of the art for TR with 80.23% F-score. In this sense, our paper lays the foundations for future work in the field of information extraction in biology texts.
This paper proposes a new approach for the encoding of images by only a few important components. Classically, this is done by the Principal Component Analysis (PCA). Recently, the Independent Component Analysis (ICA) has found strong interest in the neural network community. Applied to images, we aim for the most important source patterns with the highest occurrence probability or highest information called principal independent components (PIC). For the example of a synthetic image composed by characters this idea selects the salient ones. For natural images it does not lead to an acceptable reproduction error since no a-priori probabilities can be computed. Combining the traditional principal component criteria of PCA with the independence property of ICA we obtain a better encoding. It turns out that this definition of PIC implements the classical demand of Shannon’s rate distortion theory.
The early prediction of mortality is one of the unresolved tasks in intensive care medicine. This contribution models medical symptoms as observations cased by transitions between hidden markov states. Learning the underlying state transition probabilities results in a prediction probability success of about 91%. The results are discussed and put in relation to the model used. Finally, the rationales for using the model are reflected: Are there states in the septic shock data?
In its first part, this contribution reviews shortly the application of neural network methods to medical problems and characterizes its advantages and problems in the context of the medical background. Successful application examples show that human diagnostic capabilities are significantly worse than the neural diagnostic systems. Then, paradigm of neural networks is shortly introduced and the main problems of medical data base and the basic approaches for training and testing a network by medical data are described. Additionally, the problem of interfacing the network and its result is given and the neuro-fuzzy approach is presented. Finally, as case study of neural rule based diagnosis septic shock diagnosis is described, on one hand by a growing neural network and on the other hand by a rule based system. Keywords: Statistical Classification, Adaptive Prediction, Neural Networks, Neurofuzzy, Medical Systems
Poster presentation: Calcium plays a pivotal role in relaying electrical signals of the cell to subcellular compartments, such as the nucleus. Since this one ion type is used by the cell for many processes a neuron needs to establish finely tuned calcium pathways in order to be able to differentiate multiple tasks, [1-3].
While it is known that neurons can actively change their shape upon neuronal activity, [4-7], we here present novel findings of activity-regulated nuclear morphology, [8,9]. With the help of an experimental and computational modeling approach, we show that hippocampal neurons can change the previously spherical shape of their nuclei to complex and infolded morphologies. This morphology regulation is demonstrated to be regulated by NMDA-receptor gated calcium, while synaptic and extra-synaptic NMDA-receptors elicit opposing effects on nuclear morphology, [8].
The structural alterations of the cell nucleus have significant effects on nuclear calcium dynamics. Compartmentalization of the nucleus, due to membrane infoldings, changes calcium frequencies, amplitudes and spatial distributions, [8,10]. Since these parameters have been shown to control downstream events towards gene transcription, [11,12], the results elucidate the cellular control of nuclear function with the help of morphology modulation. With respect to processes downstream of calcium, we show that histone H3 phosphorylation is closely linked to nuclear morphology. Investigating the nuclear morphologies of hippocampal neurons, two major classes were identified [9,10]. One class contains non-infolded nuclei that have the function of calcium signal integrators, while the other class contains highly infolded nuclei, which function as frequency detectors of nuclear calcium, [10].
Extending this interdisciplinary approach of investigating structure/function relationships in neurons, the effects of cellular morphology – as well as the morphology of the endoplasmic reticulum and other organelles – on neuronal calcium signals is currently being investigated. This endeavor makes use of highly detailed, three-dimensional models of neuronal calcium dynamics, including the three-dimensional morphology of the cell and its organelles.
We empirically investigate algorithms for solving Connected Components in the external memory model. In particular, we study whether the randomized O(Sort(E)) algorithm by Karger, Klein, and Tarjan can be implemented to compete with practically promising and simpler algorithms having only slightly worse theoretical cost, namely Borůvka’s algorithm and the algorithm by Sibeyn and collaborators. For all algorithms, we develop and test a number of tuning options. Our experiments are executed on a large set of different graph classes including random graphs, grids, geometric graphs, and hyperbolic graphs. Among our findings are: The Sibeyn algorithm is a very strong contender due to its simplicity and due to an added degree of freedom in its internal workings when used in the Connected Components setting. With the right tunings, the Karger-Klein-Tarjan algorithm can be implemented to be competitive in many cases. Higher graph density seems to benefit Karger-Klein-Tarjan relative to Sibeyn. Borůvka’s algorithm is not competitive with the two others.
From 12.12.2010 to 17.12.2010, the Dagstuhl Seminar 10501 "Advances and Applications of Automata on Words and Trees" was held in Schloss Dagstuhl - Leibniz Center for Informatics. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available.
Seminar: 10501 - Advances and Applications of Automata on Words and Trees. The aim of the seminar was to discuss and systematize the recent fast progress in automata theory and to identify important directions for future research. For this, the seminar brought together more than 40 researchers from automata theory and related fields of applications. We had 19 talks of 30 minutes and 5 one-hour lectures leaving ample room for discussions. In the following we describe the topics in more detail.
This paper describes the ongoing efforts of the authors to present ancient Greek and Roman numismatic data on the public internet, with an emphasis on efforts to integrate information from multiple sources using Linked Data and Semantic Web techniques. By way of very modern metaphor, it is useful to think of coins as intentionally created packages of 'named entities'. Each coin was struck by a particular authority, often at a known site, and coins often make reference to familiar concepts such as deities, historical events, or symbols that were widely recognized in the ancient world. The institutions represented among the authors have deployed search interfaces that allow users to take advantage of this aspect of numismatic databases. The American Numismatic Society's database provides faceted search to its collection of over 550,000 objects. The Portable Antiquities Scheme (PAS) in the UK presents individual finds (and hoards) recorded throughout the country. The Römisch-Germanische Kommission and the University of Frankfurt (DBIS) are developing a prototype metaportal (INTERFACE) that accesses national databases of coin finds held in in Frankfurt, Vienna and Utrecht. Each of these resources is beginning to explore Semantic Web/Linked data approaches so that the role of numismatic standards is immediately coming to the fore. DBIS and INTERFACE are developing a numismatic ontology. At the ANS and PAS, the public database already presents RDF serializations based on Dublin Core. Together, the authors have begun to explore standardization of conceptual names on the basis of the vocabulary presented at the site http://nomisma.org . Nomisma.org is a collaborative effort to provide stable digital representations of numismatic concepts and entities. It provides URIs for such basic concepts as 'coin', 'mint', 'axis'. All of these are defined within the scope of numismatics but are already being linked to other stable resources where available. This is particularly the case for mints. For example, the URI http://nomisma.org/id/corinth is intended to represent that ancient city in its role as a minter/issuer of coins. The URI is linked via the SKOS ontology to the Pleiades Gazetteer of ancient places. This allows Nomisma to be the basis for a common representation of the concept that an object is a coin minted at Corinth. The ANS has already deployed such relationships in its public database. The work of all these projects is very much in progress so that this paper hopes to generate discussion on how multiple large projects can move forward in their own work while encouraging sufficient commonality to support large scale research questions undertaken by diverse audiences.
We present new concepts to integrate logic synthesis and physical design. Our methodology uses general Boolean transformations as known from technology-independent synthesis, and a recursive bi-partitioning placement algorithm. In each partitioning step, the precision of the layout data increases. This allows effective guidance of the logic synthesis operations for cycle time optimization. An additional advantage of our approach is that no complicated layout corrections are needed when the netlist is changed.
This paper considers the logic FOcard, i.e., first-order logic with cardinality predicates that can specify the size of a structure modulo some number. We study the expressive power of FOcard on the class of languages of ranked, finite, labelled trees with successor relations. Our first main result characterises the class of FOcard-definable tree languages in terms of algebraic closure properties of the tree languages. As it can be effectively checked whether the language of a given tree automaton satisfies these closure properties, we obtain a decidable characterisation of the class of regular tree languages definable in FOcard. Our second main result considers first-order logic with unary relations, successor relations, and two additional designated symbols < and + that must be interpreted as a linear order and its associated addition. Such a formula is called addition-invariant if, for each fixed interpretation of the unary relations and successor relations, its result is independent of the particular interpretation of < and +. We show that the FOcard-definable tree languages are exactly the regular tree languages definable in addition-invariant first-order logic. Our proof techniques involve tools from algebraic automata theory, reasoning with locality arguments, and the use of logical interpretations. We combine and extend methods developed by Benedikt and Segoufin (ACM ToCL, 2009) and Schweikardt and Segoufin (LICS, 2010).
We study threshold testing, an elementary probing model with the goal to choose a large value out of n i.i.d. random variables. An algorithm can test each variable X_i once for some threshold t_i, and the test returns binary feedback whether X_i ≥ t_i or not. Thresholds can be chosen adaptively or non-adaptively by the algorithm. Given the results for the tests of each variable, we then select the variable with highest conditional expectation. We compare the expected value obtained by the testing algorithm with expected maximum of the variables. Threshold testing is a semi-online variant of the gambler’s problem and prophet inequalities. Indeed, the optimal performance of non-adaptive algorithms for threshold testing is governed by the standard i.i.d. prophet inequality of approximately 0.745 + o(1) as n → ∞. We show how adaptive algorithms can significantly improve upon this ratio. Our adaptive testing strategy guarantees a competitive ratio of at least 0.869 - o(1). Moreover, we show that there are distributions that admit only a constant ratio c < 1, even when n → ∞. Finally, when each box can be tested multiple times (with n tests in total), we design an algorithm that achieves a ratio of 1 - o(1).
Ambiguity and communication
(2009)
The ambiguity of a nondeterministic finite automaton (NFA) N for input size n is the maximal number of accepting computations of N for an input of size n. For all k, r 2 N we construct languages Lr,k which can be recognized by NFA's with size k poly(r) and ambiguity O(nk), but Lr,k has only NFA's with exponential size, if ambiguity o(nk) is required. In particular, a hierarchy for polynomial ambiguity is obtained, solving a long standing open problem (Ravikumar and Ibarra, 1989, Leung, 1998).
Interest to become a data scientist or related professions in data science domain is rapidly growing. To meet such a demand, we propose a novel educational service that aims to provide tailored learning paths for data science. Our target user is one who aims to be an expert in data science. Our approach is to analyze the background of the practitioner and match the learning units. A critical feature is that we use gamification to reinforce the practitioner engagement. We believe that our work provides a practical guideline for those who want to learn data science.
We introduce a new method for representing and solving a general class of non-preemptive resource-constrained project scheduling problems. The new approach is to represent scheduling problems as de- scriptions (activity terms) in a language called RSV, which allows nested expressions using pll, seq, and xor. The activity-terms of RSV are similar to concepts in a description logic. The language RSV generalizes previous approaches to scheduling with variants insofar as it permits xor's not only of atomic activities but also of arbitrary activity terms. A specific semantics that assigns their set of active schedules to activity terms shows correctness of a calculus normalizing activity terms RSV similar to propositional DNF-computation. Based on RSV, this paper describes a diagram-based algorithm for the RSV-problem which uses a scan-line principle. The scan-line principle is used for determining and resolving the occurring resource conflicts and leads to a nonredundant generation of all active schedules and thus to a computation of the optimal schedule.
In order to promote the accessibility of biodiversity data in historic and contemporary literature, we introduce a new interdisciplinary project called BIOfid (FID=Fachinformationsdienst, a service for providing specialized information). The project aims at a mobilization of data available in print only by combining digitization of scientific biodiversity literature with the development of innovative text mining tools for complex, eventually semantic searches throughout the complete text corpus. A major prerequisite for the development of such search tools is the provision of sophisticated anatomy ontologies on the one hand, and of complete lists of species names (currently considered valid as well as all synonyms) at a global scale on the other hand. In the initial stage, we chose examples from German publications of the past 250 years dealing with the geographic distribution and ecology of vascular plants (Tracheophyta), birds (Aves), as well as moths and butterflies (Lepidoptera) in Germany. These taxa have been prioritized according to current demands of German research groups (about 50 sites) aiming at analyses and modeling of distribution patterns and their changes through time. In the long term, we aim at providing data and open source software applicable for any taxon and geographic region. For this purpose, a platform for open access journals for long-term availability of professional e-journals will be established. All generated data will also be made accessible through GFBio (German Federation for Biological Data). BIOfid is supported by the LIS-Scientific Library Services and Information Systems program of the German Research Foundation (DFG).
Exhaustive, automatic testing of dataflow (esp. mapreduce) programs has emerged as an important challenge. Past work demonstrated effective ways to generate small example data sets that exercise operators in the Pig platform, used to generate Hadoop map-reduce programs. Although such prior techniques attempt to cover all cases of operator use, in practice they often fail. Our SEDGE system addresses these completeness problems: for every dataflow operator, we produce data aiming to cover all cases that arise in the dataflow program (e.g., both passing and failing a filter). SEDGE relies on transforming the program into symbolic constraints, and solving the constraints using a symbolic reasoning engine (a powerful SMT solver), while using input data as concrete aids in the solution process. The approach resembles dynamic-symbolic (a.k.a. "concolic") execution in a conventional programming language, adapted to the unique features of the dataflow domain.
In third-party benchmarks, SEDGE achieves higher coverage than past techniques for 5 out of 20 PigMix benchmarks and 7 out of 11 SDSS benchmarks and (with equal coverage for the rest of the benchmarks). We also show that our targeting of the high-level dataflow language pays off: for complex programs, state-of-the-art dynamic-symbolic execution at the level of the generated map-reduce code (instead of the original dataflow program) requires many more test cases or achieves much lower coverage than our approach.