Refine
Year of publication
Document Type
- Article (392) (remove)
Language
- English (392) (remove)
Has Fulltext
- yes (392)
Is part of the Bibliography
- no (392)
Keywords
- Heavy Ion Experiments (19)
- Hadron-Hadron Scattering (11)
- Hadron-Hadron scattering (experiments) (10)
- LHC (8)
- Heavy-ion collision (7)
- Collective Flow (4)
- Petri net (4)
- Quark-Gluon Plasma (4)
- ALICE (3)
- ALICE experiment (3)
Institute
- Informatik (392) (remove)
In this talk we presented a novel technique, based on Deep Learning, to determine the impact parameter of nuclear collisions at the CBM experiment. PointNet based Deep Learning models are trained on UrQMD followed by CBMRoot simulations of Au+Au collisions at 10 AGeV to reconstruct the impact parameter of collisions from raw experimental data such as hits of the particles in the detector planes, tracks reconstructed from the hits or their combinations. The PointNet models can perform fast, accurate, event-by-event impact parameter determination in heavy ion collision experiments. They are shown to outperform a simple model which maps the track multiplicity to the impact parameter. While conventional methods for centrality classification merely provide an expected impact parameter distribution for a given centrality class, the PointNet models predict the impact parameter from 2–14 fm on an event-by-event basis with a mean error of −0.33 to 0.22 fm.
The ongoing digitalization of educational resources and the use of the internet lead to a steady increase of potentially available learning media. However, many of the media which are used for educational purposes have not been designed specifically for teaching and learning. Usually, linguistic criteria of readability and comprehensibility as well as content-related criteria are used independently to assess and compare the quality of educational media. This also holds true for educational media used in economics. This article aims to improve the analysis of textual learning media used in economic education by drawing on threshold concepts. Threshold concepts are key terms in knowledge acquisition within a domain. From a linguistic perspective, however, threshold concepts are instances of specialized vocabularies, exhibiting particular linguistic features. In three kinds of (German) resources, namely in textbooks, in newspapers, and on Wikipedia, we investigate the distributive profiles of 63 threshold concepts identified in economics education (which have been collected from threshold concept research). We looked at the threshold concepts' frequency distribution, their compound distribution, and their network structure within the three kinds of resources. The two main findings of our analysis show that firstly, the three kinds of resources can indeed be distinguished in terms of their threshold concepts' profiles. Secondly, Wikipedia definitely shows stronger associative connections between economic threshold concepts than the other sources. We discuss the findings in relation to adequate media use for teaching and learning—not only in economic education.
Learning to solve graph tasks is one of the key prerequisites of acquiring domain-specific knowledge in most study domains. Analyses of graph understanding often use eye-tracking and focus on analyzing how much time students spend gazing at particular areas of a graph—Areas of Interest (AOIs). To gain a deeper insight into students’ task-solving process, we argue that the gaze shifts between students’ fixations on different AOIs (so-termed transitions) also need to be included in holistic analyses of graph understanding that consider the importance of transitions for the task-solving process. Thus, we introduced Epistemic Network Analysis (ENA) as a novel approach to analyze eye-tracking data of 23 university students who solved eight multiple-choice graph tasks in physics and economics. ENA is a method for quantifying, visualizing, and interpreting network data allowing a weighted analysis of the gaze patterns of both correct and incorrect graph task solvers considering the interrelations between fixations and transitions. After an analysis of the differences in the number of fixations and the number of single transitions between correct and incorrect solvers, we conducted an ENA for each task. We demonstrate that an isolated analysis of fixations and transitions provides only a limited insight into graph solving behavior. In contrast, ENA identifies differences between the gaze patterns of students who solved the graph tasks correctly and incorrectly across the multiple graph tasks. For instance, incorrect solvers shifted their gaze from the graph to the x-axis and from the question to the graph comparatively more often than correct solvers. The results indicate that incorrect solvers often have problems transferring textual information into graphical information and rely more on partly irrelevant parts of a graph. Finally, we discuss how the findings can be used to design experimental studies and for innovative instructional procedures in higher education
Volatility clustering and fat tails are prominently observed in financial markets. Here, we analyze the underlying mechanisms of three agent-based models explaining these stylized facts in terms of market instabilities and compare them on empirical grounds. To this end, we first develop a general framework for detecting tail events in stock markets. In particular, we introduce Hawkes processes to automatically identify and date onsets of market turmoils which result in increased volatility. Second, we introduce three different indicators to predict those onsets. Each of the three indicators is derived from and tailored to one of the models, namely quantifying information content, critical slowing down or market risk perception. Finally, we apply our indicators to simulated and real market data. We find that all indicators reliably predict market events on simulated data and clearly distinguish the different models. In contrast, a systematic comparison on the stocks of the Forbes 500 companies shows a markedly lower performance. Overall, predicting the onset of market turmoils appears difficult, yet, over very short time horizons high or rising volatility exhibits some predictive power.
Monitoring is an indispensable tool for the operation of any large installation of grid or cluster computing, be it high energy physics or elsewhere. Usually, monitoring is configured to collect a small amount of data, just enough to enable detection of abnormal conditions. Once detected, the abnormal condition is handled by gathering all information from the affected components. This data is processed by querying it in a manner similar to a database.
This contribution shows how the metaphor of a debugger (for software applications) can be transferred to a compute cluster. The concepts of variables, assertions and breakpoints that are used in debugging can be applied to monitoring by defining variables as the quantities recorded by monitoring and breakpoints as invariants formulated via these variables. It is found that embedding fragments of a data extracting and reporting tool such as the UNIX tool awk facilitates concise notations for commonly used variables since tools like awk are designed to process large event streams (in textual representations) with bounded memory. A functional notation similar to both the pipe notation used in the UNIX shell and the point-free style used in functional programming simplify the combination of variables that commonly occur when formulating breakpoints.
We test the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted. Controlling the size factor, we investigate this hypothesis for a number of 25 subject areas. Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias. The article therefore deals with the question of whether Wikipedia exhibits this kind of linguistic relativity or not. From the perspective of educational science, the article develops a computational model of the information landscape from which multiple texts are drawn as typical input of web-based reading. For this purpose, it develops a hybrid model of intra- and intertextual similarity of different parts of the information landscape and tests this model on the example of 35 languages and corresponding Wikipedias. In the way it measures the similarities of hypertexts, the article goes beyond existing approaches by examining their structural and semantic aspects intra- and intertextually. In this way it builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.
A new method of event characterization based on Deep Learning is presented. The PointNet models can be used for fast, online event-by-event impact parameter determination at the CBM experiment. For this study, UrQMD and the CBM detector simulation are used to generate Au+Au collision events at 10 AGeV which are then used to train and evaluate PointNet based architectures. The models can be trained on features like the hit position of particles in the CBM detector planes, tracks reconstructed from the hits or combinations thereof. The Deep Learning models reconstruct impact parameters from 2-14 fm with a mean error varying from -0.33 to 0.22 fm. For impact parameters in the range of 5-14 fm, a model which uses the combination of hit and track information of particles has a relative precision of 4-9% and a mean error of -0.33 to 0.13 fm. In the same range of impact parameters, a model with only track information has a relative precision of 4-10% and a mean error of -0.18 to 0.22 fm. This new method of event-classification is shown to be more accurate and less model dependent than conventional methods and can utilize the performance boost of modern GPU processor units.
The impact of columnar file formats on SQL‐on‐hadoop engine performance: a study on ORC and Parquet
(2019)
Columnar file formats provide an efficient way to store data to be queried by SQL‐on‐Hadoop engines. Related works consider the performance of processing engine and file format together, which makes it impossible to predict their individual impact. In this work, we propose an alternative approach: by executing each file format on the same processing engine, we compare the different file formats as well as their different parameter settings. We apply our strategy to two processing engines, Hive and SparkSQL, and evaluate the performance of two columnar file formats, ORC and Parquet. We use BigBench (TPCx‐BB), a standardized application‐level benchmark for Big Data scenarios. Our experiments confirm that the file format selection and its configuration significantly affect the overall performance. We show that ORC generally performs better on Hive, whereas Parquet achieves best performance with SparkSQL. Using ZLIB compression brings up to 60.2% improvement with ORC, while Parquet achieves up to 7% improvement with Snappy. Exceptions are the queries involving text processing, which do not benefit from using any compression.
The specific temporal evolution of bacterial and phage population sizes, in particular bacterial depletion and the emergence of a resistant bacterial population, can be seen as a kinetic fingerprint that depends on the manifold interactions of the specific phage–host pair during the course of infection. We have elaborated such a kinetic fingerprint for a human urinary tract Klebsiella pneumoniae isolate and its phage vB_KpnP_Lessing by a modeling approach based on data from in vitro co-culture. We found a faster depletion of the initially sensitive bacterial population than expected from simple mass action kinetics. A possible explanation for the rapid decline of the bacterial population is a synergistic interaction of phages which can be a favorable feature for phage therapies. In addition to this interaction characteristic, analysis of the kinetic fingerprint of this bacteria and phage combination revealed several relevant aspects of their population dynamics: A reduction of the bacterial concentration can be achieved only at high multiplicity of infection whereas bacterial extinction is hardly accomplished. Furthermore the binding affinity of the phage to bacteria is identified as one of the most crucial parameters for the reduction of the bacterial population size. Thus, kinetic fingerprinting can be used to infer phage–host interactions and to explore emergent dynamics which facilitates a rational design of phage therapies.
BIOfid is a specialized information service currently being developed to mobilize biodiversity data dormant in printed historical and modern literature and to offer a platform for open access journals on the science of biodiversity. Our team of librarians, computer scientists and biologists produce high-quality text digitizations, develop new text-mining tools and generate detailed ontologies enabling semantic text analysis and semantic search by means of user-specific queries. In a pilot project we focus on German publications on the distribution and ecology of vascular plants, birds, moths and butterflies extending back to the Linnaeus period about 250 years ago. The three organism groups have been selected according to current demands of the relevant research community in Germany. The text corpus defined for this purpose comprises over 400 volumes with more than 100,000 pages to be digitized and will be complemented by journals from other digitization projects, copyright-free and project-related literature. With TextImager (Natural Language Processing & Text Visualization) and TextAnnotator (Discourse Semantic Annotation) we have already extended and launched tools that focus on the text-analytical section of our project. Furthermore, taxonomic and anatomical ontologies elaborated by us for the taxa prioritized by the project’s target group - German institutions and scientists active in biodiversity research - are constantly improved and expanded to maximize scientific data output. Our poster describes the general workflow of our project ranging from literature acquisition via software development, to data availability on the BIOfid web portal (http://biofid.de/), and the implementation into existing platforms which serve to promote global accessibility of biodiversity data.
Risk evaluations for agricultural chemicals are necessary to preserve healthy populations of honey bee colonies. Field studies on whole colonies are limited in behavioural research, while results from lab studies allow only restricted conclusions on whole colony impacts. Methods for automated long-term investigations of behaviours within comb cells, such as brood care, were hitherto missing. In the present study, we demonstrate an innovative video method that enables within-cell analysis in honey bee (Apis mellifera) observation hives to detect chronic sublethal neonicotinoid effects of clothianidin (1 and 10 ppb) and thiacloprid (200 ppb) on worker behaviour and development. In May and June, colonies which were fed 10 ppb clothianidin and 200 ppb thiacloprid in syrup over three weeks showed reduced feeding visits and duration throughout various larval development days (LDDs). On LDD 6 (capping day) total feeding duration did not differ between treatments. Behavioural adaptation was exhibited by nurses in the treatment groups in response to retarded larval development by increasing the overall feeding timespan. Using our machine learning algorithm, we demonstrate a novel method for detecting behaviours in an intact hive that can be applied in a versatile manner to conduct impact analyses of chemicals, pests and other stressors.
Measurement of ϒ(1S) elliptic flow at forward rapidity in Pb-Pb collisions at √sNN = 5.02 TeV
(2019)
The first measurement of the ϒ(1S) elliptic flow coefficient (v2) is performed at forward rapidity (2.5 < y < 4) in Pb–Pb collisions at √sNN = 5.02 TeV with the ALICE detector at the LHC. The results are obtained with the scalar product method and are reported as a function of transverse momentum (pT) up to 15 GeV/c in the 5%–60% centrality interval. The measured Υ(1S)v2 is consistent with 0 and with the small positive values predicted by transport models within uncertainties. The v2 coefficient in 2 < pT < 15 GeV/c is lower than that of inclusive J/ψ mesons in the same pT interval by 2.6 standard deviations. These results, combined with earlier suppression measurements, are in agreement with a scenario in which the Υ(1S) production in Pb–Pb collisions at LHC energies is dominated by dissociation limited to the early stage of the collision, whereas in the J/ψ case there is substantial experimental evidence of an additional regeneration component.
Dancing is an activity that positively enhances the mood of people that consists of feeling the music and expressing it in rhythmic movements with the body. Learning how to dance can be challenging because it requires proper coordination and understanding of rhythm and beat. In this paper, we present the first implementation of the Dancing Coach (DC), a generic system designed to support the practice of dancing steps, which in its current state supports the practice of basic salsa dancing steps. However, the DC has been designed to allow the addition of more dance styles. We also present the first user evaluation of the DC, which consists of user tests with 25 participants. Results from the user test show that participants stated they had learned the basic salsa dancing steps, to move to the beat and body coordination in a fun way. Results also point out some direction on how to improve the future versions of the DC.
Iconographic representations on ancient artifacts are described in many existing databases and literature as human readable text. We applied Natural Language Processing (NLP) approaches in order to extract the semantics out of these textual descriptions and in this way enable semantic searches over them. This allows more sophisticated requests compared to the common existing keyword searches. As we show in our experiments based on numismatic datasets, the approach is generic in the sense that once the system is trained on one dataset, it can be applied without any further manual work also to datasets that have similar content. Of course, additional adaptions would further improve the results. Since the approach requires manual work only during the training phase, it can easily be applied to huge datasets without manual work and therefore without major extra costs. In fact, in our experience bigger datasets generate even better results because there is more data for training. Since our approach is not bound to a certain domain and the numismatic datasets are just an example, it could serve as a blueprint for many other areas. It could also help to build bridges between disciplines since textual iconographic descriptions are to be found also for pottery, sculpture and elsewhere.
Correction to: Scientifc Reports https://doi.org/10.1038/s41598-019-43857-5, published online 17 May 2019. In the original version of this Article, Jan-Hendrik Trösemeier was incorrectly affiliated with ‘Division of Allergology, Paul Ehrlich Institut, Langen, Germany’. Te correct afliations are listed below...
Though the range of invariance in recognition of novel objects is a basic aspect of human vision, its characterization has remained surprisingly elusive. Here we report tolerance to scale and position changes in one-shot learning by measuring recognition accuracy of Korean letters presented in a flash to non-Korean subjects who had no previous experience with Korean letters. We found that humans have significant scale-invariance after only a single exposure to a novel object. The range of translation-invariance is limited, depending on the size and position of presented objects. To understand the underlying brain computation associated with the invariance properties, we compared experimental data with computational modeling results. Our results suggest that to explain invariant recognition of objects by humans, neural network models should explicitly incorporate built-in scale-invariance, by encoding different scale channels as well as eccentricity-dependent representations captured by neurons’ receptive field sizes and sampling density that change with eccentricity. Our psychophysical experiments and related simulations strongly suggest that the human visual system uses a computational strategy that differs in some key aspects from current deep learning architectures, being more data efficient and relying more critically on eye-movements.
In pathology, tissue images are evaluated using a light microscope, relying on the expertise and experience of pathologists. There is a great need for computational methods to quantify and standardize histological observations. Computational quantification methods become more and more essential to evaluate tissue images. In particular, the distribution of tumor cells and their microenvironment are of special interest. Here, we systematically investigated tumor cell properties and their spatial neighborhood relations by a new application of statistical analysis to whole slide images of Hodgkin lymphoma, a tumor arising in lymph nodes, and inflammation of lymph nodes called lymphadenitis. We considered properties of more than 400, 000 immunohistochemically stained, CD30-positive cells in 35 whole slide images of tissue sections from subtypes of the classical Hodgkin lymphoma, nodular sclerosis and mixed cellularity, as well as from lymphadenitis. We found that cells of specific morphology exhibited significant favored and unfavored spatial neighborhood relations of cells in dependence of their morphology. This information is important to evaluate differences between Hodgkin lymph nodes infiltrated by tumor cells (Hodgkin lymphoma) and inflamed lymph nodes, concerning the neighborhood relations of cells and the sizes of cells. The quantification of neighborhood relations revealed new insights of relations of CD30-positive cells in different diagnosis cases. The approach is general and can easily be applied to whole slide image analysis of other tumor types.
The morphology of presynaptic specializations can vary greatly ranging from classical single-release-site boutons in the central nervous system to boutons of various sizes harboring multiple vesicle release sites. Multi-release-site boutons can be found in several neural contexts, for example at the neuromuscular junction (NMJ) of body wall muscles of Drosophila larvae. These NMJs are built by two motor neurons forming two types of glutamatergic multi-release-site boutons with two typical diameters. However, it is unknown why these distinct nerve terminal configurations are used on the same postsynaptic muscle fiber. To systematically dissect the biophysical properties of these boutons we developed a full three-dimensional model of such boutons, their release sites and transmitter-harboring vesicles and analyzed the local vesicle dynamics of various configurations during stimulation. Here we show that the rate of transmission of a bouton is primarily limited by diffusion-based vesicle movements and that the probability of vesicle release and the size of a bouton affect bouton-performance in distinct temporal domains allowing for an optimal transmission of the neural signals at different time scales. A comparison of our in silico simulations with in vivo recordings of the natural motor pattern of both neurons revealed that the bouton properties resemble a well-tuned cooperation of the parameters release probability and bouton size, enabling a reliable transmission of the prevailing firing-pattern at diffusion-limited boutons. Our findings indicate that the prevailing firing-pattern of a neuron may determine the physiological and morphological parameters required for its synaptic terminals.
Relying on the theory of Saward (2010) and Disch (2015), we study political representation through the lens of representative claim-making. We identify a gap between the theoretical concept of claim-making and the empirical (quantitative) assessment of representative claims made in the real world’s representative contexts. Therefore, we develop a new approach to map and quantify representative claims in order to subsequently measure the reception and validation of the claims by the audience. To test our method, we analyse all the debates of the German parliament concerned with the introduction of the gender quota in German supervisory boards from 2013 to 2017 in a two-step process. At first, we assess which constituencies the MPs claim to represent and how they justify their stance. Drawing on multiple correspondence analysis, we identify different claim patterns. Second, making use of natural language processing techniques and logistic regression on social media data, we measure if and how the asserted claims in the parliamentary debates are received and validated by the respective audience. We come to the conclusion that the constituency as ultimate judge of legitimacy has not been comprehensively conceptualized yet.
The formulation of the Partial Information Decomposition (PID) framework by Williams and Beer in 2010 attracted a significant amount of attention to the problem of defining redundant (or shared), unique and synergistic (or complementary) components of mutual information that a set of source variables provides about a target. This attention resulted in a number of measures proposed to capture these concepts, theoretical investigations into such measures, and applications to empirical data (in particular to datasets from neuroscience). In this Special Issue on “Information Decomposition of Target Effects from Multi-Source Interactions” at Entropy, we have gathered current work on such information decomposition approaches from many of the leading research groups in the field. We begin our editorial by providing the reader with a review of previous information decomposition research, including an overview of the variety of measures proposed, how they have been interpreted and applied to empirical investigations. We then introduce the articles included in the special issue one by one, providing a similar categorisation of these articles into: i. proposals of new measures; ii. theoretical investigations into properties and interpretations of such approaches, and iii. applications of these measures in empirical studies. We finish by providing an outlook on the future of the field.
This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as a model for low resource languages in general. Cuneiform texts are invaluable sources for the study of history, languages, economy, and cultures of Ancient Mesopotamia and its surrounding regions. Assyriology, the discipline dedicated to their study, has vast research potential, but lacks the modern means for computational processing and analysis. Our project, Machine Translation and Automated Analysis of Cuneiform Languages, aims to fill this gap by bringing together corpus data, lexical data, linguistic annotations and object metadata. The project’s main goal is to build a pipeline for machine translation and annotation of Sumerian Ur III administrative texts. The rich and structured data is then to be made accessible in the form of (Linguistic) Linked Open Data (LLOD), which should open them to a larger research community. Our contribution is two-fold: in terms of language technology, our work represents the first attempt to develop an integrative infrastructure for the annotation of morphology and syntax on the basis of RDF technologies and LLOD resources. With respect to Assyriology, we work towards producing the first syntactically annotated corpus of Sumerian.
Anaplastic large cell lymphoma (ALCL) and classical Hodgkin lymphoma (cHL) are lymphomas that contain CD30-expressing tumor cells and have numerous pathological similarities. Whereas ALCL is usually diagnosed at an advanced stage, cHL more frequently presents with localized disease. The aim of the present study was to elucidate the mechanisms underlying the different clinical presentation of ALCL and cHL. Chemokine and chemokine receptor expression were similar in primary ALCL and cHL cases apart from the known overexpression of the chemokines CCL17 and CCL22 in the Hodgkin and Reed-Sternberg (HRS) cells of cHL. Consistent with the overexpression of these chemokines, primary cHL cases encountered a significantly denser T cell microenvironment than ALCL. Additionally to differences in the interaction with their microenvironment, cHL cell lines presented a lower and less efficient intrinsic cell motility than ALCL cell lines, as assessed by time-lapse microscopy in a collagen gel and transwell migration assays. We thus propose that the combination of impaired basal cell motility and differences in the interaction with the microenvironment hamper the dissemination of HRS cells in cHL when compared with the tumor cells of ALCL.
The development of multimodal sensor-based applications designed to support learners with the improvement of their skills is expensive since most of these applications are tailor-made and built from scratch. In this paper, we show how the Presentation Trainer (PT), a multimodal sensor-based application designed to support the development of public speaking skills, can be modularly extended with a Virtual Reality real-time feedback module (VR module), which makes usage of the PT more immersive and comprehensive. The described study consists of a formative evaluation and has two main objectives. Firstly, a technical objective is concerned with the feasibility of extending the PT with an immersive VR Module. Secondly, a user experience objective focuses on the level of satisfaction of interacting with the VR extended PT. To study these objectives, we conducted user tests with 20 participants. Results from our test show the feasibility of modularly extending existing multimodal sensor-based applications, and in terms of learning and user experience, results indicate a positive attitude of the participants towards using the application (PT+VR module).
Background: Microarray analysis represents a powerful way to test scientific hypotheses on the functionality of cells. The measurements consider the whole genome, and the large number of generated data requires sophisticated analysis. To date, no gold-standard for the analysis of microarray images has been established. Due to the lack of a standard approach there is a strong need to identify new processing algorithms.
Methods: We propose a novel approach based on hyperbolic partial differential equations (PDEs) for unsupervised spot segmentation. Prior to segmentation, morphological operations were applied for the identification of co-localized groups of spots. A grid alignment was performed to determine the borderlines between rows and columns of spots. PDEs were applied to detect the inflection points within each column and row; vertical and horizontal luminance profiles were evolved respectively. The inflection points of the profiles determined borderlines that confined a spot within adapted rectangular areas. A subsequent k-means clustering determined the pixels of each individual spot and its local background.
Results: We evaluated the approach for a data set of microarray images taken from the Stanford Microarray Database (SMD). The data set is based on two studies on global gene expression profiles of Arabidopsis Thaliana. We computed values for spot intensity, regression ratio, and coefficient of determination. For spots with irregular contours and inner holes, we found intensity values that were significantly different from those determined by the GenePix Pro microarray analysis software. We determined the set of differentially expressed genes from our intensities and identified more activated genes than were predicted by the GenePix software.
Conclusions: Our method represents a worthwhile alternative and complement to standard approaches used in industry and academy. We highlight the importance of our spot segmentation approach, which identified supplementary important genes, to better explains the molecular mechanisms that are activated in a defense responses to virus and pathogen infection.
The production of K∗(892)0 and ϕ(1020) mesons has been measured in p–Pb collisions at √sNN = 5.02 TeV. K∗0 and ϕ are reconstructed via their decay into charged hadrons with the ALICE detector in the rapidity range - 0.5 < y < 0. The transverse momentum spectra, measured as a function of the multiplicity, have a pT range from 0 to 15 GeV/c for K∗0 and from 0.3 to 21 GeV/c for ϕ. Integrated yields, mean transverse momenta and particle ratios are reported and compared with results in pp collisions at √s = 7 TeV and Pb–Pb collisions at √sNN = 2.76 TeV. In Pb–Pb and p–Pb collisions, K∗0 and ϕ probe the hadronic phase of the system and contribute to the study of particle formation mechanisms by comparison with other identified hadrons. For this purpose, the mean transverse momenta and the differential proton-to-ϕ ratio are discussed as a function of the multiplicity of the event. The short-lived K∗0 is measured to investigate re-scattering effects, believed to be related to the size of the system and to the lifetime of the hadronic phase.
Heterologously expressed genes require adaptation to the host organism to ensure adequate levels of protein synthesis, which is typically approached by replacing codons by the target organism’s preferred codons. In view of frequently encountered suboptimal outcomes we introduce the codon-specific elongation model (COSEM) as an alternative concept. COSEM simulates ribosome dynamics during mRNA translation and informs about protein synthesis rates per mRNA in an organism- and context-dependent way. Protein synthesis rates from COSEM are integrated with further relevant covariates such as translation accuracy into a protein expression score that we use for codon optimization. The scoring algorithm further enables fine-tuning of protein expression including deoptimization and is implemented in the software OCTOPOS. The protein expression score produces competitive predictions on proteomic data from prokaryotic, eukaryotic, and human expression systems. In addition, we optimized and tested heterologous expression of manA and ova genes in Salmonella enterica serovar Typhimurium. Superiority over standard methodology was demonstrated by a threefold increase in protein yield compared to wildtype and commercially optimized sequences.
CRFVoter : gene and protein related object recognition using a conglomerate of CRF-based tools
(2019)
Background: Gene and protein related objects are an important class of entities in biomedical research, whose identification and extraction from scientific articles is attracting increasing interest. In this work, we describe an approach to the BioCreative V.5 challenge regarding the recognition and classification of gene and protein related objects. For this purpose, we transform the task as posed by BioCreative V.5 into a sequence labeling problem. We present a series of sequence labeling systems that we used and adapted in our experiments for solving this task. Our experiments show how to optimize the hyperparameters of the classifiers involved. To this end, we utilize various algorithms for hyperparameter optimization. Finally, we present CRFVoter, a two-stage application of Conditional Random Field (CRF) that integrates the optimized sequence labelers from our study into one ensemble classifier.
Results: We analyze the impact of hyperparameter optimization regarding named entity recognition in biomedical research and show that this optimization results in a performance increase of up to 60%. In our evaluation, our ensemble classifier based on multiple sequence labelers, called CRFVoter, outperforms each individual extractor’s performance. For the blinded test set provided by the BioCreative organizers, CRFVoter achieves an F-score of 75%, a recall of 71% and a precision of 80%. For the GPRO type 1 evaluation, CRFVoter achieves an F-Score of 73%, a recall of 70% and achieved the best precision (77%) among all task participants.
Conclusion: CRFVoter is effective when multiple sequence labeling systems are to be used and performs better then the individual systems collected by it.
The hepatitis C virus (HCV) RNA replication cycle is a dynamic intracellular process occurring in three-dimensional space (3D), which is difficult both to capture experimentally and to visualize conceptually. HCV-generated replication factories are housed within virus-induced intracellular structures termed membranous webs (MW), which are derived from the Endoplasmatic Reticulum (ER). Recently, we published 3D spatiotemporal resolved diffusion–reaction models of the HCV RNA replication cycle by means of surface partial differential equation (sPDE) descriptions. We distinguished between the basic components of the HCV RNA replication cycle, namely HCV RNA, non-structural viral proteins (NSPs), and a host factor. In particular, we evaluated the sPDE models upon realistic reconstructed intracellular compartments (ER/MW). In this paper, we propose a significant extension of the model based upon two additional parameters: different aggregate states of HCV RNA and NSPs, and population dynamics inspired diffusion and reaction coefficients instead of multilinear ones. The combination of both aspects enables realistic modeling of viral replication at all scales. Specifically, we describe a replication complex state consisting of HCV RNA together with a defined amount of NSPs. As a result of the combination of spatial resolution and different aggregate states, the new model mimics a cis requirement for HCV RNA replication. We used heuristic parameters for our simulations, which were run only on a subsection of the ER. Nevertheless, this was sufficient to allow the fitting of core aspects of virus reproduction, at least qualitatively. Our findings should help stimulate new model approaches and experimental directions for virology.
LSTMVoter : chemical named entity recognition using a conglomerate of sequence labeling tools
(2019)
Background: Chemical and biomedical named entity recognition (NER) is an essential preprocessing task in natural language processing. The identification and extraction of named entities from scientific articles is also attracting increasing interest in many scientific disciplines. Locating chemical named entities in the literature is an essential step in chemical text mining pipelines for identifying chemical mentions, their properties, and relations as discussed in the literature. In this work, we describe an approach to the BioCreative V.5 challenge regarding the recognition and classification of chemical named entities. For this purpose, we transform the task of NER into a sequence labeling problem. We present a series of sequence labeling systems that we used, adapted and optimized in our experiments for solving this task. To this end, we experiment with hyperparameter optimization. Finally, we present LSTMVoter, a two-stage application of recurrent neural networks that integrates the optimized sequence labelers from our study into a single ensemble classifier.
Results: We introduce LSTMVoter, a bidirectional long short-term memory (LSTM) tagger that utilizes a conditional random field layer in conjunction with attention-based feature modeling. Our approach explores information about features that is modeled by means of an attention mechanism. LSTMVoter outperforms each extractor integrated by it in a series of experiments. On the BioCreative IV chemical compound and drug name recognition (CHEMDNER) corpus, LSTMVoter achieves an F1-score of 90.04%; on the BioCreative V.5 chemical entity mention in patents corpus, it achieves an F1-score of 89.01%.
Availability and implementation: Data and code are available at https://github.com/texttechnologylab/LSTMVoter.
The endoplasmic reticulum–mitochondria encounter structure (ERMES) connects the mitochondrial outer membrane with the ER. Multiple functions have been linked to ERMES, including maintenance of mitochondrial morphology, protein assembly and phospholipid homeostasis. Since the mitochondrial distribution and morphology protein Mdm10 is present in both ERMES and the mitochondrial sorting and assembly machinery (SAM), it is unknown how the ERMES functions are connected on a molecular level. Here we report that conserved surface areas on opposite sides of the Mdm10 β-barrel interact with SAM and ERMES, respectively. We generated point mutants to separate protein assembly (SAM) from morphology and phospholipid homeostasis (ERMES). Our study reveals that the β-barrel channel of Mdm10 serves different functions. Mdm10 promotes the biogenesis of α-helical and β-barrel proteins at SAM and functions as integral membrane anchor of ERMES, demonstrating that SAM-mediated protein assembly is distinct from ER-mitochondria contact sites.
In this paper, we study the limit of compactness which is a graph index originally introduced for measuring structural characteristics of hypermedia. Applying compactness to large scale small-world graphs (Mehler, 2008) observed its limit behaviour to be equal 1. The striking question concerning this finding was whether this limit behaviour resulted from the specifics of small-world graphs or was simply an artefact. In this paper, we determine the necessary and sufficient conditions for any sequence of connected graphs resulting in a limit value of CB = 1 which can be generalized with some consideration for the case of disconnected graph classes (Theorem 3). This result can be applied to many well-known classes of connected graphs. Here, we illustrate it by considering four examples. In fact, our proof-theoretical approach allows for quickly obtaining the limit value of compactness for many graph classes sparing computational costs.
Background: Although mortality after cardiac surgery has significantly decreased in the last decade, patients still experience clinically relevant postoperative complications. Among others, atrial fibrillation (AF) is a common consequence of cardiac surgery, which is associated with prolonged hospitalization and increased mortality.
Methods: We retrospectively analyzed data from patients who underwent coronary artery bypass grafting, valve surgery or a combination of both at the University Hospital Muenster between April 2014 and July 2015. We evaluated the incidence of new onset and intermittent/permanent AF (patients with pre- and postoperative AF). Furthermore, we investigated the impact of postoperative AF on clinical outcomes and evaluated potential risk factors.
Results: In total, 999 patients were included in the analysis. New onset AF occurred in 24.9% of the patients and the incidence of intermittent/permanent AF was 59.5%. Both types of postoperative AF were associated with prolonged ICU length of stay (median increase approx. 2 days) and duration of mechanical ventilation (median increase 1 h). Additionally, new onset AF patients had a higher rate of dialysis and hospital mortality and more positive fluid balance on the day of surgery and postoperative days 1 and 2. In a multiple logistic regression model, advanced age (odds ratio (OR) = 1.448 per decade increase, p < 0.0001), a combination of CABG and valve surgery (OR = 1.711, p = 0.047), higher C-reactive protein (OR = 1.06 per unit increase, p < 0.0001) and creatinine plasma concentration (OR = 1.287 per unit increase, p = 0.032) significantly predicted new onset AF. Higher Horowitz index values were associated with a reduced risk (OR = 0.996 per unit increase, p = 0.012). In a separate model, higher plasma creatinine concentration (OR = 2.125 per unit increase, p = 0.022) was a significant risk factor for intermittent/permanent AF whereas higher plasma phosphate concentration (OR = 0.522 per unit increase, p = 0.003) indicated reduced occurrence of this arrhythmia.
Conclusions: New onset and intermittent/permanent AF are associated with adverse clinical outcomes of elective cardiac surgery patients. Different risk factors implicated in postoperative AF suggest different mechanisms might be involved in its pathogenesis. Customized clinical management protocols seem to be warranted for a higher success rate of prevention and treatment of postoperative AF.
The transverse momentum distributions of the strange and double-strange hyperon resonances (Σ(1385)±,Ξ(1530)0) produced in p–Pb collisions at √sNN = 5.02 TeV were measured in the rapidity range −0.5<yCMS<0 for event classes corresponding to different charged-particle multiplicity densities, ⟨dNch/dηlab⟩. The mean transverse momentum values are presented as a function of ⟨dNch/dηlab⟩, as well as a function of the particle masses and compared with previous results on hyperon production. The integrated yield ratios of excited to ground-state hyperons are constant as a function of ⟨dNch/dηlab⟩. The equivalent ratios to pions exhibit an increase with ⟨dNch/dηlab⟩, depending on their strangeness content.
Motivation: Arabidopsis thaliana is a well-established model system for the analysis of the basic physiological and metabolic pathways of plants. Nevertheless, the system is not yet fully understood, although many mechanisms are described, and information for many processes exists. However, the combination and interpretation of the large amount of biological data remain a big challenge, not only because data sets for metabolic paths are still incomplete. Moreover, they are often inconsistent, because they are coming from different experiments of various scales, regarding, for example, accuracy and/or significance. Here, theoretical modeling is powerful to formulate hypotheses for pathways and the dynamics of the metabolism, even if the biological data are incomplete. To develop reliable mathematical models they have to be proven for consistency. This is still a challenging task because many verification techniques fail already for middle-sized models. Consequently, new methods, like decomposition methods or reduction approaches, are developed to circumvent this problem.
Methods: We present a new semi-quantitative mathematical model of the metabolism of Arabidopsis thaliana. We used the Petri net formalism to express the complex reaction system in a mathematically unique manner. To verify the model for correctness and consistency we applied concepts of network decomposition and network reduction such as transition invariants, common transition pairs, and invariant transition pairs.
Results: We formulated the core metabolism of Arabidopsis thaliana based on recent knowledge from literature, including the Calvin cycle, glycolysis and citric acid cycle, glyoxylate cycle, urea cycle, sucrose synthesis, and the starch metabolism. By applying network decomposition and reduction techniques at steady-state conditions, we suggest a straightforward mathematical modeling process. We demonstrate that potential steady-state pathways exist, which provide the fixed carbon to nearly all parts of the network, especially to the citric acid cycle. There is a close cooperation of important metabolic pathways, e.g., the de novo synthesis of uridine-5-monophosphate, the γ-aminobutyric acid shunt, and the urea cycle. The presented approach extends the established methods for a feasible interpretation of biological network models, in particular of large and complex models.
Background: Signal transduction pathways are important cellular processes to maintain the cell’s integrity. Their imbalance can cause severe pathologies. As signal transduction pathways feature complex regulations, they form intertwined networks. Mathematical models aim to capture their regulatory logic and allow an unbiased analysis of robustness and vulnerability of the signaling network. Pathway detection is yet a challenge for the analysis of signaling networks in the field of systems biology. A rigorous mathematical formalism is lacking to identify all possible signal flows in a network model.
Results: In this paper, we introduce the concept of Manatee invariants for the analysis of signal transduction networks. We present an algorithm for the characterization of the combinatorial diversity of signal flows, e.g., from signal reception to cellular response. We demonstrate the concept for a small model of the TNFR1-mediated NF- κB signaling pathway. Manatee invariants reveal all possible signal flows in the network. Further, we show the application of Manatee invariants for in silico knockout experiments. Here, we illustrate the biological relevance of the concept.
Conclusions: The proposed mathematical framework reveals the entire variety of signal flows in models of signaling systems, including cyclic regulations. Thereby, Manatee invariants allow for the analysis of robustness and vulnerability of signaling networks. The application to further analyses such as for in silico knockout was shown. The new framework of Manatee invariants contributes to an advanced examination of signaling systems.
Exploring biophysical properties of virus-encoded components and their requirement for virus replication is an exciting new area of interdisciplinary virological research. To date, spatial resolution has only rarely been analyzed in computational/biophysical descriptions of virus replication dynamics. However, it is widely acknowledged that intracellular spatial dependence is a crucial component of virus life cycles. The hepatitis C virus-encoded NS5A protein is an endoplasmatic reticulum (ER)-anchored viral protein and an essential component of the virus replication machinery. Therefore, we simulate NS5A dynamics on realistic reconstructed, curved ER surfaces by means of surface partial differential equations (sPDE) upon unstructured grids. We match the in silico NS5A diffusion constant such that the NS5A sPDE simulation data reproduce experimental NS5A fluorescence recovery after photobleaching (FRAP) time series data. This parameter estimation yields the NS5A diffusion constant. Such parameters are needed for spatial models of HCV dynamics, which we are developing in parallel but remain qualitative at this stage. Thus, our present study likely provides the first quantitative biophysical description of the movement of a viral component. Our spatio-temporal resolved ansatz paves new ways for understanding intricate spatial-defined processes central to specfic aspects of virus life cycles.
Data structures and advanced models of computation on big data : report from Dagstuhl seminar 14091
(2014)
This report documents the program and the outcomes of Dagstuhl Seminar 14091 "Data Structures and Advanced Models of Computation on Big Data". In today's computing environment vast amounts of data are processed, exchanged and analyzed. The manner in which information is stored profoundly influences the efficiency of these operations over the data. In spite of the maturity of the field many data structuring problems are still open, while new ones arise due to technological advances.
The seminar covered both recent advances in the "classical" data structuring topics as well as new models of computation adapted to modern architectures, scientific studies that reveal the need for such models, applications where large data sets play a central role, modern computing platforms for very large data, and new data structures for large data in modern architectures.
The extended abstracts included in this report contain both recent state of the art advances and lay the foundation for new directions within data structures research.
Synaptic release sites are characterized by exocytosis-competent synaptic vesicles tightly anchored to the presynaptic active zone (PAZ) whose proteome orchestrates the fast signaling events involved in synaptic vesicle cycle and plasticity. Allocation of the amyloid precursor protein (APP) to the PAZ proteome implicated a functional impact of APP in neuronal communication. In this study, we combined state-of-the-art proteomics, electrophysiology and bioinformatics to address protein abundance and functional changes at the native hippocampal PAZ in young and old APP-KO mice. We evaluated if APP deletion has an impact on the metabolic activity of presynaptic mitochondria. Furthermore, we quantified differences in the phosphorylation status after long-term-potentiation (LTP) induction at the purified native PAZ. We observed an increase in the phosphorylation of the signaling enzyme calmodulin-dependent kinase II (CaMKII) only in old APP-KO mice. During aging APP deletion is accompanied by a severe decrease in metabolic activity and hyperphosphorylation of CaMKII. This attributes an essential functional role to APP at hippocampal PAZ and putative molecular mechanisms underlying the age-dependent impairments in learning and memory in APP-KO mice.
We study Gaifman locality and Hanf locality of an extension of first-order logic with modulo p counting quantifiers (FO+MODp , for short) with arbitrary numerical predicates. We require that the validity of formulas is independent of the particular interpretation of the numerical predicates and refer to such formulas as arb-invariant formulas. This paper gives a detailed picture of locality and non-locality properties of arb-invariant FO+MODp . For example, on the class of all finite structures, for any p 2, arb-invariant FO+MODp is neither Hanf nor Gaifman local with respect to a sublinear locality radius. However, in case that p is an odd prime power, it is weakly Gaifman local with a polylogarithmic locality radius. And when restricting attention to the class of string structures, for odd prime powers p, arb-invariant FO+MODp is both Hanf and Gaifman local with a polylogarithmic locality radius. Our negative results build on examples of order-invariant FO+MODp formulas presented in Niemist ̈o’s PhD thesis. Our positive results make use of the close connection between FO+MODp and Boolean circuits built from NOT-gates and AND-, OR-, and MOD p - gates of arbitrary fan-in.
We consider the isolated spelling error correction problem as a specific subproblem of the more general string-to-string translation problem. In this context, we investigate four general string-to-string transformation models that have been suggested in recent years and apply them within the spelling error correction paradigm. In particular, we investigate how a simple ‘k-best decoding plus dictionary lookup’ strategy performs in this context and find that such an approach can significantly outdo baselines such as edit distance, weighted edit distance, and the noisy channel Brill and Moore model to spelling error correction. We also consider elementary combination techniques for our models such as language model weighted majority voting and center string combination. Finally, we consider real-world OCR post-correction for a dataset sampled from medieval Latin texts.
Research in the field of Digital Humanities, also known as Humanities Computing, has seen a steady increase over the past years. Situated at the intersection of computing science and the humanities, present efforts focus on making resources such as texts, images, musical pieces and other semiotic artifacts digitally available, searchable and analysable. To this end, computational tools enabling textual search, visual analytics, data mining, statistics and natural language processing are harnessed to support the humanities researcher. The processing of large data sets with appropriate software opens up novel and fruitful approaches to questions in the traditional humanities. This report summarizes the Dagstuhl seminar 14301 on “Computational Humanities - bridging the gap between Computer Science and Digital Humanities”.
1998 ACM Subject Classification I.2.7 Natural Language Processing, J.5 Arts and Humanities
We present results on transverse momentum (pT) and rapidity (y) differential production cross sections, mean transverse momentum and mean transverse momentum square of inclusive J/ψ and ψ(2S) at forward rapidity (2.5 < y < 4) as well as ψ(2S)-to-J/ψ cross section ratios. These quantities are measured in pp collisions at center of mass energies s√=5.02 and 13 TeV with the ALICE detector. Both charmonium states are reconstructed in the dimuon decay channel, using the muon spectrometer. A comprehensive comparison to inclusive charmonium cross sections measured at s√=2.76, 7 and 8 TeV is performed. A comparison to non-relativistic quantum chromodynamics and fixed-order next-to-leading logarithm calculations, which describe prompt and non-prompt charmonium production respectively, is also presented. A good description of the data is obtained over the full pT range, provided that both contributions are summed. In particular, it is found that for pT > 15 GeV/c the non-prompt contribution reaches up to 50% of the total charmonium yield.
This is a short summary of a recent survey [FR03] focusing on the observed evidence, that Internet connectivity is positively correlated with spread of democracy at high levels of significance. The results of multivariate correlation analysis and probabilities regression estimate models are based on the combined analysis of mid - 1991’s, to 2001 data series of the Eurostat’s and US Census Bureau, the World Bank, and OECD’s statistical data service which track the growth of information technology and rating of freedom and democracy worldwide.
We present an implementation of an interpreter LRPi for the call-by-need calculus LRP, based on a variant of Sestoft's abstract machine Mark 1, extended with an eager garbage collector. It is used as a tool for exact space usage analyses as a support for our investigations into space improvements of call-by-need calculi.
50 years of amino acid hydrophobicity scales : revisiting the capacity for peptide classification
(2016)
Background: Physicochemical properties are frequently analyzed to characterize protein-sequences of known and unknown function. Especially the hydrophobicity of amino acids is often used for structural prediction or for the detection of membrane associated or embedded β-sheets and α-helices. For this purpose many scales classifying amino acids according to their physicochemical properties have been defined over the past decades. In parallel, several hydrophobicity parameters have been defined for calculation of peptide properties. We analyzed the performance of separating sequence pools using 98 hydrophobicity scales and five different hydrophobicity parameters, namely the overall hydrophobicity, the hydrophobic moment for detection of the α-helical and β-sheet membrane segments, the alternating hydrophobicity and the exact ß-strand score.
Results: Most of the scales are capable of discriminating between transmembrane α-helices and transmembrane β-sheets, but assignment of peptides to pools of soluble peptides of different secondary structures is not achieved at the same quality. The separation capacity as measure of the discrimination between different structural elements is best by using the five different hydrophobicity parameters, but addition of the alternating hydrophobicity does not provide a large benefit. An in silico evolutionary approach shows that scales have limitation in separation capacity with a maximal threshold of 0.6 in general. We observed that scales derived from the evolutionary approach performed best in separating the different peptide pools when values for arginine and tyrosine were largely distinct from the value of glutamate. Finally, the separation of secondary structure pools via hydrophobicity can be supported by specific detectable patterns of four amino acids.
Conclusion: It could be assumed that the quality of separation capacity of a certain scale depends on the spacing of the hydrophobicity value of certain amino acids. Irrespective of the wealth of hydrophobicity scales a scale separating all different kinds of secondary structures or between soluble and transmembrane peptides does not exist reflecting that properties other than hydrophobicity affect secondary structure formation as well. Nevertheless, application of hydrophobicity scales allows distinguishing between peptides with transmembrane α-helices and β-sheets. Furthermore, the overall separation capacity score of 0.6 using different hydrophobicity parameters could be assisted by pattern search on the protein sequence level for specific peptides with a length of four amino acids.
The amyloid precursor protein (APP) was discovered in the 1980s as the precursor protein of the amyloid A4 peptide. The amyloid A4 peptide, also known as A-beta (Aβ), is the main constituent of senile plaques implicated in Alzheimer’s disease (AD). In association with the amyloid deposits, increasing impairments in learning and memory as well as the degeneration of neurons especially in the hippocampus formation are hallmarks of the pathogenesis of AD. Within the last decades much effort has been expended into understanding the pathogenesis of AD. However, little is known about the physiological role of APP within the central nervous system (CNS). Allocating APP to the proteome of the highly dynamic presynaptic active zone (PAZ) identified APP as a novel player within this neuronal communication and signaling network. The analysis of the hippocampal PAZ proteome derived from APP-mutant mice demonstrates that APP is tightly embedded in the underlying protein network. Strikingly, APP deletion accounts for major dysregulation within the PAZ proteome network. Ca2+-homeostasis, neurotransmitter release and mitochondrial function are affected and resemble the outcome during the pathogenesis of AD. The observed changes in protein abundance that occur in the absence of APP as well as in AD suggest that APP is a structural and functional regulator within the hippocampal PAZ proteome. Within this review article, we intend to introduce APP as an important player within the hippocampal PAZ proteome and to outline the impact of APP deletion on individual PAZ proteome subcommunities.
The degradation of cytosol-invading pathogens by autophagy, a process known as xenophagy, is an important mechanism of the innate immune system. Inside the host, Salmonella Typhimurium invades epithelial cells and resides within a specialized intracellular compartment, the Salmonella-containing vacuole. A fraction of these bacteria does not persist inside the vacuole and enters the host cytosol. Salmonella Typhimurium that invades the host cytosol becomes a target of the autophagy machinery for degradation. The xenophagy pathway has recently been discovered, and the exact molecular processes are not entirely characterized. Complete kinetic data for each molecular process is not available, so far. We developed a mathematical model of the xenophagy pathway to investigate this key defense mechanism. In this paper, we present a Petri net model of Salmonella xenophagy in epithelial cells. The model is based on functional information derived from literature data. It comprises the molecular mechanism of galectin-8-dependent and ubiquitin-dependent autophagy, including regulatory processes, like nutrient-dependent regulation of autophagy and TBK1-dependent activation of the autophagy receptor, OPTN. To model the activation of TBK1, we proposed a new mechanism of TBK1 activation, suggesting a spatial and temporal regulation of this process. Using standard Petri net analysis techniques, we found basic functional modules, which describe different pathways of the autophagic capture of Salmonella and reflect the basic dynamics of the system. To verify the model, we performed in silico knockout experiments. We introduced a new concept of knockout analysis to systematically compute and visualize the results, using an in silico knockout matrix. The results of the in silico knockout analyses were consistent with published experimental results and provide a basis for future investigations of the Salmonella xenophagy pathway.
Author Summary
Salmonellae are Gram-negative bacteria, which cause the majority of foodborne diseases worldwide. Serovars of Salmonella cause a broad range of diseases, ranging from diarrhea to typhoid fever in a variety of hosts. In the year 2010, Salmonella Typhi caused 7.6 million foodborne diseases and 52 000 deaths, and Salmonella enterica was responsible for 78.7 million diseases and 59 000 deaths. After invasion of Salmonella into host epithelial cells, a small fraction of Salmonella escapes from a specialized intracellular compartment and replicates inside the host cytosol. Xenophagy is a host defense mechanism to protect the host cell from cytosolic pathogens. Understanding how Salmonella is recognized and targeted for xenophagy is an important subject of current research. To the best of our knowledge, no mathematical model has been presented so far, describing the process of Salmonella Typhimurium xenophagy. Here, we present a manually curated and mathematically verified theoretical model of Salmonella Typhimurium xenophagy in epithelial cells, which is consistent with the current state of knowledge. Our model reproduces literature data and postulates new hypotheses for future investigations.
The future heavy-ion experiment CBM (FAIR/GSI, Darmstadt, Germany) will focus on measurement of very rare probes at interaction rates up to 10 MHz with data flow of up to 1 TB/s. The beam will provide free stream of beam particles without bunch structure. That requires full online event reconstruction and selection not only in space, but also in time, so-called 4D event building and selection.
The FLES (First-Level Event Selection) reconstruction and selection package consists of several modules: track finding, track fitting, short-lived particles finding, event building and event selection. A time-slice is reconstructed in parallel between cores within a same CPU, thus minimizing the communication between CPUs. After all tracks are found and fitted in 4D, they are collected into clusters of tracks originated from common primary vertices, which then are fitted, thus identifying 4D interaction points registered within the time-slice. Secondary tracks are associated with primary vertices according to their estimated production time. After that, short-lived particles are found and the full event building process is finished. The last stage of the FLES package is the selection of events according to the requested trigger signatures.
This paper provides a theoretical assessment of gestures in the context of authoring image-related hypertexts by example of the museum information system WikiNect. To this end, a first implementation of gestural writing based on image schemata is provided (Lakoff in Women, fire, and dangerous things: what categories reveal about the mind. University of Chicago Press, Chicago, 1987). Gestural writing is defined as a sort of coding in which propositions are only expressed by means of gestures. In this respect, it is shown that image schemata allow for bridging between natural language predicates and gestural manifestations. Further, it is demonstrated that gestural writing primarily focuses on the perceptual level of image descriptions (Hollink et al. in Int J Hum Comput Stud 61(5):601–626, 2004). By exploring the metaphorical potential of image schemata, it is finally illustrated how to extend the expressiveness of gestural writing in order to reach the conceptual level of image descriptions. In this context, the paper paves the way for implementing museum information systems like WikiNect as systems of kinetic hypertext authoring based on full-fledged gestural writing.
Viruses rely completely on the hosts' machinery for translation of viral transcripts. However, for most viruses infecting humans, codon usage preferences (CUPrefs) do not match those of the host. Human papillomaviruses (HPVs) are a showcase to tackle this paradox: they present a large genotypic diversity and a broad range of phenotypic presentations, from asymptomatic infections to productive lesions and cancer. By applying phylogenetic inference and dimensionality reduction methods, we demonstrate first that genes in HPVs are poorly adapted to the average human CUPrefs, the only exception being capsid genes in viruses causing productive lesions. Phylogenetic relationships between HPVs explained only a small proportion of CUPrefs variation. Instead, the most important explanatory factor for viral CUPrefs was infection phenotype, as orthologous genes in viruses with similar clinical presentation displayed similar CUPrefs. Moreover, viral genes with similar spatiotemporal expression patterns also showed similar CUPrefs. Our results suggest that CUPrefs in HPVs reflect either variations in the mutation bias or differential selection pressures depending on the clinical presentation and expression timing. We propose that poor viral CUPrefs may be central to a trade-off between strong viral gene expression and the potential for eliciting protective immune response.
We provide elementary algorithms for two preservation theorems for first-order sentences (FO) on the class ℭd of all finite structures of degree at most d: For each FO-sentence that is preserved under extensions (homomorphisms) on ℭd, a ℭd-equivalent existential (existential-positive) FO-sentence can be constructed in 5-fold (4-fold) exponential time. This is complemented by lower bounds showing that a 3-fold exponential blow-up of the computed existential (existential-positive) sentence is unavoidable. Both algorithms can be extended (while maintaining the upper and lower bounds on their time complexity) to input first-order sentences with modulo m counting quantifiers (FO+MODm). Furthermore, we show that for an input FO-formula, a ℭd-equivalent Feferman-Vaught decomposition can be computed in 3-fold exponential time. We also provide a matching lower bound
This paper shows equivalence of several versions of applicative similarity and contextual approximation, and hence also of applicative bisimilarity and contextual equivalence, in LR, the deterministic call-by-need lambda calculus with letrec extended by data constructors, case-expressions and Haskell's seq-operator. LR models an untyped version of the core language of Haskell. The use of bisimilarities simplifies equivalence proofs in calculi and opens a way for more convenient correctness proofs for program transformations. The proof is by a fully abstract and surjective transfer into a call-by-name calculus, which is an extension of Abramsky's lazy lambda calculus. In the latter calculus equivalence of our similarities and contextual approximation can be shown by Howe's method. Similarity is transferred back to LR on the basis of an inductively defined similarity. The translation from the call-by-need letrec calculus into the extended call-by-name lambda calculus is the composition of two translations. The first translation replaces the call-by-need strategy by a call-by-name strategy and its correctness is shown by exploiting infinite trees which emerge by unfolding the letrec expressions. The second translation encodes letrec-expressions by using multi-fixpoint combinators and its correctness is shown syntactically by comparing reductions of both calculi. A further result of this paper is an isomorphism between the mentioned calculi, which is also an identity on letrec-free expressions.
Network graphs have become a popular tool to represent complex systems composed of many interacting subunits; especially in neuroscience, network graphs are increasingly used to represent and analyze functional interactions between multiple neural sources. Interactions are often reconstructed using pairwise bivariate analyses, overlooking the multivariate nature of interactions: it is neglected that investigating the effect of one source on a target necessitates to take all other sources as potential nuisance variables into account; also combinations of sources may act jointly on a given target. Bivariate analyses produce networks that may contain spurious interactions, which reduce the interpretability of the network and its graph metrics. A truly multivariate reconstruction, however, is computationally intractable because of the combinatorial explosion in the number of potential interactions. Thus, we have to resort to approximative methods to handle the intractability of multivariate interaction reconstruction, and thereby enable the use of networks in neuroscience. Here, we suggest such an approximative approach in the form of an algorithm that extends fast bivariate interaction reconstruction by identifying potentially spurious interactions post-hoc: the algorithm uses interaction delays reconstructed for directed bivariate interactions to tag potentially spurious edges on the basis of their timing signatures in the context of the surrounding network. Such tagged interactions may then be pruned, which produces a statistically conservative network approximation that is guaranteed to contain non-spurious interactions only. We describe the algorithm and present a reference implementation in MATLAB to test the algorithm’s performance on simulated networks as well as networks derived from magnetoencephalographic data. We discuss the algorithm in relation to other approximative multivariate methods and highlight suitable application scenarios. Our approach is a tractable and data-efficient way of reconstructing approximative networks of multivariate interactions. It is preferable if available data are limited or if fully multivariate approaches are computationally infeasible.
1D-3D hybrid modeling : from multi-compartment models to full resolution models in space and time
(2014)
Investigation of cellular and network dynamics in the brain by means of modeling and simulation has evolved into a highly interdisciplinary field, that uses sophisticated modeling and simulation approaches to understand distinct areas of brain function. Depending on the underlying complexity, these models vary in their level of detail, in order to cope with the attached computational cost. Hence for large network simulations, single neurons are typically reduced to time-dependent signal processors, dismissing the spatial aspect of each cell. For single cell or networks with relatively small numbers of neurons, general purpose simulators allow for space and time-dependent simulations of electrical signal processing, based on the cable equation theory. An emerging field in Computational Neuroscience encompasses a new level of detail by incorporating the full three-dimensional morphology of cells and organelles into three-dimensional, space and time-dependent, simulations. While every approach has its advantages and limitations, such as computational cost, integrated and methods-spanning simulation approaches, depending on the network size could establish new ways to investigate the brain. In this paper we present a hybrid simulation approach, that makes use of reduced 1D-models using e.g., the NEURON simulator—which couples to fully resolved models for simulating cellular and sub-cellular dynamics, including the detailed three-dimensional morphology of neurons and organelles. In order to couple 1D- and 3D-simulations, we present a geometry-, membrane potential- and intracellular concentration mapping framework, with which graph- based morphologies, e.g., in the swc- or hoc-format, are mapped to full surface and volume representations of the neuron and computational data from 1D-simulations can be used as boundary conditions for full 3D simulations and vice versa. Thus, established models and data, based on general purpose 1D-simulators, can be directly coupled to the emerging field of fully resolved, highly detailed 3D-modeling approaches. We present the developed general framework for 1D/3D hybrid modeling and apply it to investigate electrically active neurons and their intracellular spatio-temporal calcium dynamics.
Alternative polyadenylation (APA) is a widespread mechanism that contributes to the sophisticated dynamics of gene regulation. Approximately 50% of all protein-coding human genes harbor multiple polyadenylation (PA) sites; their selective and combinatorial use gives rise to transcript variants with differing length of their 3' untranslated region (3'UTR). Shortened variants escape UTR-mediated regulation by microRNAs (miRNAs), especially in cancer, where global 3'UTR shortening accelerates disease progression, dedifferentiation and proliferation. Here we present APADB, a database of vertebrate PA sites determined by 3' end sequencing, using massive analysis of complementary DNA ends. APADB provides (A)PA sites for coding and non-coding transcripts of human, mouse and chicken genes. For human and mouse, several tissue types, including different cancer specimens, are available. APADB records the loss of predicted miRNA binding sites and visualizes next-generation sequencing reads that support each PA site in a genome browser. The database tables can either be browsed according to organism and tissue or alternatively searched for a gene of interest. APADB is the largest database of APA in human, chicken and mouse. The stored information provides experimental evidence for thousands of PA sites and APA events. APADB combines 3' end sequencing data with prediction algorithms of miRNA binding sites, allowing to further improve prediction algorithms. Current databases lack correct information about 3'UTR lengths, especially for chicken, and APADB provides necessary information to close this gap. Database URL: http://tools.genxpro.net/apadb/
Aging of biological systems is controlled by various processes which have a potential impact on gene expression. Here we report a genome-wide transcriptome analysis of the fungal aging model Podospora anserina. Total RNA of three individuals of defined age were pooled and analyzed by SuperSAGE (serial analysis of gene expression). A bioinformatics analysis identified different molecular pathways to be affected during aging. While the abundance of transcripts linked to ribosomes and to the proteasome quality control system were found to decrease during aging, those associated with autophagy increase, suggesting that autophagy may act as a compensatory quality control pathway. Transcript profiles associated with the energy metabolism including mitochondrial functions were identified to fluctuate during aging. Comparison of wild-type transcripts, which are continuously down-regulated during aging, with those down-regulated in the long-lived, copper-uptake mutant grisea, validated the relevance of age-related changes in cellular copper metabolism. Overall, we (i) present a unique age-related data set of a longitudinal study of the experimental aging model P. anserina which represents a reference resource for future investigations in a variety of organisms, (ii) suggest autophagy to be a key quality control pathway that becomes active once other pathways fail, and (iii) present testable predictions for subsequent experimental investigations.
We introduce tree-width for first order formulae φ, fotw(φ). We show that computing fotw is fixed-parameter tractable with parameter fotw. Moreover, we show that on classes of formulae of bounded fotw, model checking is fixed parameter tractable, with parameter the length of the formula. This is done by translating a formula φ with fotw(φ)<k into a formula of the k-variable fragment Lk of first order logic. For fixed k, the question whether a given first order formula is equivalent to an Lk formula is undecidable. In contrast, the classes of first order formulae with bounded fotw are fragments of first order logic for which the equivalence is decidable. Our notion of tree-width generalises tree-width of conjunctive queries to arbitrary formulae of first order logic by taking into account the quantifier interaction in a formula. Moreover, it is more powerful than the notion of elimination-width of quantified constraint formulae, defined by Chen and Dalmau (CSL 2005): for quantified constraint formulae, both bounded elimination-width and bounded fotw allow for model checking in polynomial time. We prove that fotw of a quantified constraint formula φ is bounded by the elimination-width of φ, and we exhibit a class of quantified constraint formulae with bounded fotw, that has unbounded elimination-width. A similar comparison holds for strict tree-width of non-recursive stratified datalog as defined by Flum, Frick, and Grohe (JACM 49, 2002). Finally, we show that fotw has a characterization in terms of a cops and robbers game without monotonicity cost.
Poster presentation: Twenty Second Annual Computational Neuroscience Meeting: CNS*2013. Paris, France. 13-18 July 2013.
The synaptic cleft is an extracellular domain that is capable of relaying a presynaptically received electrical signal by diffusive neurotransmitters to the postsynaptic membrane. The cleft is trans-synaptically bridged by ring-like shaped clusters of pre- and postsynaptically localized calcium-dependent adhesion proteins of the N-Cadherin type and is possibly the smallest intercircuit in nervous systems [1]. The strength of association between the pre- and postsynaptic membranes can account for synaptic plasticity such as long-term potentiation [2]. Through neuronal activity the intra- and extracellular calcium levels are modulated through calcium exchangers embedded in the pre- and postsynaptic membrane. Variations of the concentration of cleft calcium induces changes in the N-Cadherin-zipper, that in synaptic resting states is rigid and tightly connects the pre- and postsynaptic domain. During synaptic activity calcium concentrations are hypothesized to drop below critical thresholds which leads to loosening of the N-Cadherin connections and subsequently "unzips" the Cadherin-mediated connection. These processes may result in changes in synaptic strength [2]. In order to investigate the calcium-mediated N-Cadherin dynamics at the synaptic cleft, we developed a three-dimensional model including the cleft morphology and all prominent calcium exchangers and corresponding density distributions [3-6]. The necessity for a fully three-dimensional model becomes apparent, when investigating the effects of the spatial architecture of the synapse [7], [8]. Our data show, that the localization of calcium channels with respect to the N-Cadherin ring has substantial effects on the time-scales on which the Cadherin-zipper switches between states, ranging from seconds to minutes. This will have significant effects on synaptic signaling. Furthermore we see, that high-frequency action potential firing can only be relayed to the Calcium/N-Cadherin-system at a synapse under precise spatial synaptic reorganization.
To truly appreciate the myriad of events which relate synaptic function and vesicle dynamics, simulations should be done in a spatially realistic environment. This holds true in particular in order to explain as well the rather astonishing motor patterns which we observed within in vivo recordings which underlie peristaltic contractionsas well as the shape of the EPSPs at different forms of long-term stimulation, presented both here, at a well characterized synapse, the neuromuscular junction (NMJ) of the Drosophila larva (c.f. Figure 1). To this end, we have employed a reductionist approach and generated three dimensional models of single presynaptic boutons at the Drosophila larval NMJ. Vesicle dynamics are described by diffusion-like partial differential equations which are solved numerically on unstructured grids using the uG platform. In our model we varied parameters such as bouton-size, vesicle output probability (Po), stimulation frequency and number of synapses, to observe how altering these parameters effected bouton function. Hence we demonstrate that the morphologic and physiologic specialization maybe a convergent evolutionary adaptation to regulate the trade off between sustained, low output, and short term, high output, synaptic signals. There seems to be a biologically meaningful explanation for the co-existence of the two different bouton types as previously observed at the NMJ (characterized especially by the relation between size and Po), the assigning of two different tasks with respect to short- and long-time behaviour could allow for an optimized interplay of different synapse types. We can present astonishing similar results of experimental and simulation data which could be gained in particular without any data fitting, however based only on biophysical values which could be taken from different experimental results. As a side product, we demonstrate how advanced methods from numerical mathematics could help in future to resolve also other difficult experimental neurobiological issues.
Functional modules of metabolic networks are essential for understanding the metabolism of an organism as a whole. With the vast amount of experimental data and the construction of complex and large-scale, often genome-wide, models, the computer-aided identification of functional modules becomes more and more important. Since steady states play a key role in biology, many methods have been developed in that context, for example, elementary flux modes, extreme pathways, transition invariants and place invariants. Metabolic networks can be studied also from the point of view of graph theory, and algorithms for graph decomposition have been applied for the identification of functional modules. A prominent and currently intensively discussed field of methods in graph theory addresses the Q-modularity. In this paper, we recall known concepts of module detection based on the steady-state assumption, focusing on transition-invariants (elementary modes) and their computation as minimal solutions of systems of Diophantine equations. We present the Fourier-Motzkin algorithm in detail. Afterwards, we introduce the Q-modularity as an example for a useful non-steady-state method and its application to metabolic networks. To illustrate and discuss the concepts of invariants and Q-modularity, we apply a part of the central carbon metabolism in potato tubers (Solanum tuberosum) as running example. The intention of the paper is to give a compact presentation of known steady-state concepts from a graph-theoretical viewpoint in the context of network decomposition and reduction and to introduce the application of Q-modularity to metabolic Petri net models.
Finding motifs in biological, social, technological, and other types of networks has become a widespread method to gain more knowledge about these networks’ structure and function. However, this task is very computationally demanding, because it is highly associated with the graph isomorphism which is an NP problem (not known to belong to P or NP-complete subsets yet). Accordingly, this research is endeavoring to decrease the need to call NAUTY isomorphism detection method, which is the most time-consuming step in many existing algorithms. The work provides an extremely fast motif detection algorithm called QuateXelero, which has a Quaternary Tree data structure in the heart. The proposed algorithm is based on the well-known ESU (FANMOD) motif detection algorithm. The results of experiments on some standard model networks approve the overal superiority of the proposed algorithm, namely QuateXelero, compared with two of the fastest existing algorithms, G-Tries and Kavosh. QuateXelero is especially fastest in constructing the central data structure of the algorithm from scratch based on the input network.
This article shows that there exist two particular linear orders such that first-order logic with these two linear orders has the same expressive power as first-order logic with the Bit-predicate FO(Bit). As a corollary we obtain that there also exists a built-in permutation such that first-order logic with a linear order and this permutation is as expressive as FO(Bit).
Succinctness is a natural measure for comparing the strength of different logics. Intuitively, a logic L_1 is more succinct than another logic L_2 if all properties that can be expressed in L_2 can be expressed in L_1 by formulas of (approximately) the same size, but some properties can be expressed in L_1 by (significantly) smaller formulas.
We study the succinctness of logics on linear orders. Our first theorem is concerned with the finite variable fragments of first-order logic. We prove that:
(i) Up to a polynomial factor, the 2- and the 3-variable fragments of first-order logic on linear orders have the same succinctness. (ii) The 4-variable fragment is exponentially more succinct than the 3-variable fragment. Our second main result compares the succinctness of first-order logic on linear orders with that of monadic second-order logic. We prove that the fragment of monadic second-order logic that has the same expressiveness as first-order logic on linear orders is non-elementarily more succinct than first-order logic.
Effect sizes in experimental pain produced by gender, genetic variants and sensitization procedures
(2011)
Background: Various effects on pain have been reported with respect to their statistical significance, but a standardized measure of effect size has been rarely added. Such a measure would ease comparison of the magnitude of the effects across studies, for example the effect of gender on heat pain with the effect of a genetic variant on pressure pain. Methodology/Principal Findings: Effect sizes on pain thresholds to stimuli consisting of heat, cold, blunt pressure, punctuate pressure and electrical current, administered to 125 subjects, were analyzed for 29 common variants in eight human genes reportedly modulating pain, gender and sensitization procedures using capsaicin or menthol. The genotype explained 0–5.9% of the total interindividual variance in pain thresholds to various stimuli and produced mainly small effects (Cohen's d 0–1.8). The largest effect had the TRPA1 rs13255063T/rs11988795G haplotype explaining >5% of the variance in electrical pain thresholds and conferring lower pain sensitivity to homozygous carriers. Gender produced larger effect sizes than most variant alleles (1–14.8% explained variance, Cohen's d 0.2–0.8), with higher pain sensitivity in women than in men. Sensitization by capsaicin or menthol explained up to 63% of the total variance (4.7–62.8%) and produced largest effects according to Cohen's d (0.4–2.6), especially heat sensitization by capsaicin (Cohen's d = 2.6). Conclusions: Sensitization, gender and genetic variants produce effects on pain in the mentioned order of effect sizes. The present report may provide a basis for comparative discussions of factors influencing pain.
In dyadic communication, both interlocutors adapt to each other linguistically, that is, they align interpersonally. In this article, we develop a framework for modeling interpersonal alignment in terms of the structural similarity of the interlocutors’ dialog lexica. This is done by means of so-called two-layer time-aligned network series, that is, a time-adjusted graph model. The graph model is partitioned into two layers, so that the interlocutors’ lexica are captured as subgraphs of an encompassing dialog graph. Each constituent network of the series is updated utterance-wise. Thus, both the inherent bipartition of dyadic conversations and their gradual development are modeled. The notion of alignment is then operationalized within a quantitative model of structure formation based on the mutual information of the subgraphs that represent the interlocutor’s dialog lexica. By adapting and further developing several models of complex network theory, we show that dialog lexica evolve as a novel class of graphs that have not been considered before in the area of complex (linguistic) networks. Additionally, we show that our framework allows for classifying dialogs according to their alignment status. To the best of our knowledge, this is the first approach to measuring alignment in communication that explores the similarities of graph-like cognitive representations. Keywords: alignment in communication; structural coupling; linguistic networks; graph distance measures; mutual information of graphs; quantitative network analysis
Poster presentation from Twentieth Annual Computational Neuroscience Meeting: CNS*2011 Stockholm, Sweden. 23-28 July 2011. To truly appreciate the myriad of events which relate synaptic function and vesicle dynamics, simulations should be done in a spatially realistic environment. This holds true in particular in order to explain the rather astonishing motor patterns presented here which we observed within in vivo recordings which underlie peristaltic contractions at a well characterized synapse, the neuromuscular junction (NMJ) of the Drosophila larva. To this end, we have employed a reductionist approach and generated three dimensional models of single presynaptic boutons at the Drosophila larval NMJ. Vesicle dynamics are described by diffusion-like partial differential equations which are solved numerically on unstructured grids using the uG platform. In our model we varied parameters such as bouton-size, vesicle output probability (Po), stimulation frequency and number of synapses, to observe how altering these parameters effected bouton function. Hence we demonstrate that the morphologic and physiologic specialization maybe a convergent evolutionary adaptation to regulate the trade off between sustained, low output, and short term, high output, synaptic signals. There seems to be a biologically meaningful explanation for the co-existence of the two different bouton types as previously observed at the NMJ (characterized especially by the relation between size and Po),the assigning of two different tasks with respect to short- and long-time behaviour could allow for an optimized interplay of different synapse types. As a side product, we demonstrate how advanced methods from numerical mathematics could help in future to resolve also other difficult experimental neurobiological issues.
Since the description of sepsis by Schottmüller in 1914, the amount on knowledge available on sepsis and its underlying pathophysiology has substantially increased. Epidemiologic examinations of abdominal septic shock patients show the potential for high risk posed by and the extensive therapy situation in the intensive care unit (ICU) (5). Unfortunately, until now it has not been possible to significantly reduce the mortality rate of septic shock, which is as high as 50-60% worldwide, although PROWESS' results (1) are encouraging. This paper summarizes the main results of the MEDAN project and their medical impacts. Several aspects are already published, see the references. The heterogeneity of patient groups and the variations in therapy strategies is seen as one of the main problems for sepsis trials. In the MEDAN multi-center study of 71 intensive care units in Germany, a group of 382 patients made up exclusively of abdominal septic shock patients who met the consensus criteria for septic shock (3) was analysed. For use within scores or stand-alone experiments variables are often studied as isolated variables, not as a multidimensional whole, e.g. a recent study takes a look at the role thrombocytes play (15). To avoid this limitation, our study compares several established scores (SOFA, APACHE II, SAPS II, MODS) by a multi-dimensional neuronal network analysis. For outcome prediction the data of 382 patients was analysed by using most of the commonly documented vital parameters and doses of medicine (metric variables). Data was collected in German hospitals from 1998 to 2001. The 382 handwritten patient records were transferred to an electronic database giving the amount of 2.5 million data entries. The metric data contained in the database is composed of daily measurements and doses of medicine. We used range and plausibility checks to allow no faulty data in the electronic database. 187 of the 382 patients are deceased (49 %).
Attraction and commercial success of web sites depend heavily on the additional values visitors may find. Here, individual, automatically obtained and maintained user profiles are the key for user satisfaction. This contribution shows for the example of a cooking information site how user profiles might be obtained using category information provided by cooking recipes. It is shown that metrical distance functions and standard clustering procedures lead to erroneous results. Instead, we propose a new mutual information based clustering approach and outline its implications for the example of user profiling.
Data driven automatic model selection and parameter adaptation – a case study for septic shock
(2004)
In bioinformatics, biochemical pathways can be modeled by many differential equations. It is still an open problem how to fit the huge amount of parameters of the equations to the available data. Here, the approach of systematically learning the parameters is necessary. This paper propose as model selection criterion the least complex description of the observed data by the model, the minimum description length. For the small, but important example of inflammation modeling the performance of the approach is evaluated.
In bioinformatics, biochemical signal pathways can be modeled by many differential equations. It is still an open problem how to fit the huge amount of parameters of the equations to the available data. Here, the approach of systematically obtaining the most appropriate model and learning its parameters is extremely interesting. One of the most often used approaches for model selection is to choose the least complex model which “fits the needs”. For noisy measurements, the model which has the smallest mean squared error of the observed data results in a model which fits too accurately to the data – it is overfitting. Such a model will perform good on the training data, but worse on unknown data. This paper propose as model selection criterion the least complex description of the observed data by the model, the minimum description length. For the small, but important example of inflammation modeling the performance of the approach is evaluated. Keywords: biochemical pathways, differential equations, septic shock, parameter estimation, overfitting, minimum description length.
In bioinformatics, biochemical pathways can be modeled by many differential equations. It is still an open problem how to fit the huge amount of parameters of the equations to the available data. Here, the approach of systematically learning the parameters is necessary. In this paper, for the small, important example of inflammation modeling a network is constructed and different learning algorithms are proposed. It turned out that due to the nonlinear dynamics evolutionary approaches are necessary to fit the parameters for sparse, given data. Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence - ICTAI 2003
In bioinformatics, biochemical pathways can be modeled by many differential equations. It is still an open problem how to fit the huge amount of parameters of the equations to the available data. Here, the approach of systematically learning the parameters is necessary. In this paper, for the small, important example of inflammation modeling a network is constructed and different learning algorithms are proposed. It turned out that due to the nonlinear dynamics evolutionary approaches are necessary to fit the parameters for sparse, given data. Keywords: model parameter adaption, septic shock. coupled differential equations, genetic algorithm.
In contrast to the symbolic approach, neural networks seldom are designed to explain what they have learned. This is a major obstacle for its use in everyday life. With the appearance of neuro-fuzzy systems which use vague, human-like categories the situation has changed. Based on the well-known mechanisms of learning for RBF networks, a special neuro-fuzzy interface is proposed in this paper. It is especially useful in medical applications, using the notation and habits of physicians and other medically trained people. As an example, a liver disease diagnosis system is presented.
The prevention of credit card fraud is an important application for prediction techniques. One major obstacle for using neural network training techniques is the high necessary diagnostic quality: Since only one financial transaction of a thousand is invalid no prediction success less than 99.9% is acceptable. Due to these credit card transaction proportions complete new concepts had to be developed and tested on real credit card data. This paper shows how advanced data mining techniques and neural network algorithm can be combined successfully to obtain a high fraud coverage combined with a low false alarm rate.
This paper describes the use of a Radial Basis Function (RBF) neural network in the approximation of process parameters for the extrusion of a rubber profile in tyre production. After introducing the rubber industry problem, the RBF network model and the RBF net learning algorithm are developed, which uses a growing number of RBF units to compensate the approximation error up to the desired error limit. Its performance is shown for simple analytic examples. Then the paper describes the modelling of the industrial problem. Simulations show good results, even when using only a few training samples. The paper is concluded by a discussion of possible systematic error influences, improvements and potential generalisation benefits. Keywords: Adaptive process control; Parameter estimation; RBF-nets; Rubber extrusion
In this paper we regard first the situation where parallel channels are disturbed by noise. With the goal of maximal information conservation we deduce the conditions for a transform which "immunizes" the channels against noise influence before the signals are used in later operations. It shows up that the signals have to be decorrelated and normalized by the filter which corresponds for the case of one channel to the classical result of Shannon. Additional simulations for image encoding and decoding show that this constitutes an efficient approach for noise suppression. Furthermore, by a corresponding objective function we deduce the stochastic and deterministic learning rules for a neural network that implements the data orthonormalization. In comparison with other already existing normalization networks our network shows approximately the same in the stochastic case but, by its generic deduction ensures the convergence and enables the use as independent building block in other contexts, e.g. whitening for independent component analysis. Keywords: information conservation, whitening filter, data orthonormalization network, image encoding, noise suppression.
This paper describes the use of a radial basis function (RBF) neural network. It approximates the process parameters for the extrusion of a rubber profile used in tyre production. After introducing the problem, we describe the RBF net algorithm and the modeling of the industrial problem. The algorithm shows good results even using only a few training samples. It turns out that the „curse of dimensions“ plays an important role in the model. The paper concludes by a discussion of possible systematic error influences and improvements.
The paper focuses on the division of the sensor field into subsets of sensor events and proposes the linear transformation with the smallest achievable error for reproduction: the transform coding approach using the principal component analysis (PCA). For the implementation of the PCA, this paper introduces a new symmetrical, lateral inhibited neural network model, proposes an objective function for it and deduces the corresponding learning rules. The necessary conditions for the learning rate and the inhibition parameter for balancing the crosscorrelations vs. the autocorrelations are computed. The simulation reveals that an increasing inhibition can speed up the convergence process in the beginning slightly. In the remaining paper, the application of the network in picture encoding is discussed. Here, the use of non-completely connected networks for the self-organized formation of templates in cellular neural networks is shown. It turns out that the self-organizing Kohonen map is just the non-linear, first order approximation of a general self-organizing scheme. Hereby, the classical transform picture coding is changed to a parallel, local model of linear transformation by locally changing sets of self-organized eigenvector projections with overlapping input receptive fields. This approach favors an effective, cheap implementation of sensor encoding directly on the sensor chip. Keywords: Transform coding, Principal component analysis, Lateral inhibited network, Cellular neural network, Kohonen map, Self-organized eigenvector jets.
After a short introduction into traditional image transform coding, multirate systems and multiscale signal coding the paper focuses on the subject of image encoding by a neural network. Taking also noise into account a network model is proposed which not only learns the optimal localized basis functions for the transform but also learns to implement a whitening filter by multi-resolution encoding. A simulation showing the multi-resolution capabilitys concludes the contribution.
We present a framework for the self-organized formation of high level learning by a statistical preprocessing of features. The paper focuses first on the formation of the features in the context of layers of feature processing units as a kind of resource-restricted associative multiresolution learning We clame that such an architecture must reach maturity by basic statistical proportions, optimizing the information processing capabilities of each layer. The final symbolic output is learned by pure association of features of different levels and kind of sensorial input. Finally, we also show that common error-correction learning for motor skills can be accomplished also by non-specific associative learning. Keywords: feedforward network layers, maximal information gain, restricted Hebbian learning, cellular neural nets, evolutionary associative learning
It is well known that artificial neural nets can be used as approximators of any continuous functions to any desired degree and therefore be used e.g. in high - speed, real-time process control. Nevertheless, for a given application and a given network architecture the non-trivial task remains to determine the necessary number of neurons and the necessary accuracy (number of bits) per weight for a satisfactory operation which are critical issues in VLSI and computer implementations of nontrivial tasks. In this paper the accuracy of the weights and the number of neurons are seen as general system parameters which determine the maximal approximation error by the absolute amount and the relative distribution of information contained in the network. We define as the error-bounded network descriptional complexity the minimal number of bits for a class of approximation networks which show a certain approximation error and achieve the conditions for this goal by the new principle of optimal information distribution. For two examples, a simple linear approximation of a non-linear, quadratic function and a non-linear approximation of the inverse kinematic transformation used in robot manipulator control, the principle of optimal information distribution gives the the optimal number of neurons and the resolutions of the variables, i.e. the minimal amount of storage for the neural net. Keywords: Kolmogorov complexity, e-Entropy, rate-distortion theory, approximation networks, information distribution, weight resolutions, Kohonen mapping, robot control.
We provide the first non-trivial result on dynamic breadth-first search (BFS) in external-memory: For general sparse undirected graphs of initially $n$ nodes and O(n) edges and monotone update sequences of either $\Theta(n)$ edge insertions or $\Theta(n)$ edge deletions, we prove an amortized high-probability bound of $O(n/B^{2/3}+\sort(n)\cdot \log B)$ I/Os per update. In contrast, the currently best approach for static BFS on sparse undirected graphs requires $\Omega(n/B^{1/2}+\sort(n))$ I/Os. 1998 ACM Subject Classification: F.2.2. Key words and phrases: External Memory, Dynamic Graph Algorithms, BFS, Randomization.
Poster presentation: The analysis of neuronal processes distributed across multiple cortical areas aims at the identification of interactions between signals recorded at different sites. Such interactions can be described by measuring the stability of phase angles in the case of oscillatory signals or other forms of signal dependencies for less regular signals. Before, however, any form of interaction can be analyzed at a given time and frequency, it is necessary to assess whether all potentially contributing signals are present. We have developed a new statistical procedure for the detection of coincident power in multiple simultaneously recorded analog signals, allowing the classification of events as 'non-accidental co-activation'. This method can effectively operate on single trials, each lasting only for a few seconds. Signals need to be transformed into time-frequency space, e.g. by applying a short-time Fourier transformation using a Gaussian window. The discrete wavelet transform (DWT) is used in order to weight the resulting power patterns according to their frequency. Subsequently, the weighted power patterns are binarized via applying a threshold. At this final stage, significant power coincidence is determined across all subgroups of channel combinations for individual frequencies by selecting the maximum ratio between observed and expected duration of co-activation as test statistic. The null hypothesis that the activity in each channel is independent from the activity in every other channel is simulated by independent, random rotation of the respective activity patterns. We applied this procedure to single trials of multiple simultaneously sampled local field potentials (LFPs) obtained from occipital, parietal, central and precentral areas of three macaque monkeys. Since their task was to use visual cues to perform a precise arm movement, co-activation of numerous cortical sites was expected. In a data set with 17 channels analyzed, up to 13 sites expressed simultaneous power in the range between 5 and 240 Hz. On average, more than 50% of active channels participated at least once in a significant power co-activation pattern (PCP). Because the significance of such PCPs can be evaluated at the level of single trials, we are confident that this procedure is useful to study single trial variability with sufficient accuracy that much of the behavioral variability can be explained by the dynamics of the underlying distributed neuronal processes.
Poster presentation: Introduction The ability of neurons to emit different firing patterns is considered relevant for neuronal information processing. In dopaminergic neurons, prominent patterns include highly regular pacemakers with separate spikes and stereotyped intervals, processes with repetitive bursts and partial regularity, and irregular spike trains with nonstationary properties. In order to model and quantify these processes and the variability of their patterns with respect to pharmacological and cellular properties, we aim to describe the two dimensions of burstiness and regularity in a single model framework. Methods We present a stochastic spike train model in which the degree of burstiness and the regularity of the oscillation are described independently and with two simple parameters. In this model, a background oscillation with independent and normally distributed intervals gives rise to Poissonian spike packets with a Gaussian firing intensity. The variability of inter-burst intervals and the average number of spikes in each burst indicate regularity and burstiness, respectively. These parameters can be estimated by fitting the model to the autocorrelograms. This allows to assign every spike train a position in the two-dimensional space described by regularity and burstiness and thus, to investigate the dependence of the firing patterns on different experimental conditions. Finally, burst detection in single spike trains is possible within the model because the parameter estimates determine the appropriate bandwidth that should be used for burst identification. Results and Discussion We applied the model to a sample data set obtained from dopaminergic substantia nigra and ventral tegmental area neurons recorded extracellularly in vivo and studied differences between the firing activity of dopaminergic neurons in wildtype and K-ATP channel knock-out mice. The model is able to represent a variety of discharge patterns and to describe changes induced pharmacologically. It provides a simple and objective classification scheme for the observed spike trains into pacemaker, irregular and bursty processes. In addition to the simple classification, changes in the parameters can be studied quantitatively, also including the properties related to bursting behavior. Interestingly, the proposed algorithm for burst detection may be applicable also to spike trains with nonstationary firing rates if the remaining parameters are unaffected. Thus, the proposed model and its burst detection algorithm can be useful for the description and investigation of neuronal firing patterns and their variability with cellular and experimental conditions.
Poster presentation: An important challenge in neuroscience is understanding how networks of neurons go about processing information. Synapses are thought to play an essential role in cellular information processing however quantitative and mathematical models of the underlying physiologic processes that occur at synaptic active zones are lacking. We are generating mathematical models of synaptic vesicle dynamics at a well-characterized model synapse, the Drosophila larval neuromuscular junction. This synapse's simplicity, accessibility to various electrophysiological recording and imaging techniques, and the genetic malleability intrinsic to Drosophila system make it ideal for computational and mathematical studies. We have employed a reductionist approach and started by modeling single presynaptic boutons. Synaptic vesicles can be divided into different pools; however, a quantitative understanding of their dynamics at the Drosophila neuromuscular junction is lacking [4]. We performed biologically realistic simulations of high and low release probability boutons [3] using partial differential equations (PDE) taking into account not only the evolution in time but also the spatial structure in two dimensions (the extension to three dimensions will be implemented soon). PDEs are solved using UG, a program library for the calculation of multi-dimensional PDEs solved using a finite volume approach and implicit time stepping methods leading to extended linear equation systems be solvedwith multi-grid methods [3,4]. Numerical calculations are done on multi-processor computers for fast calculations using different parameters in order to asses the biological feasibility of different models. In preliminary simulations, we modeled vesicle dynamics as a diffusion process describing exocytosis as Neumann streams at synaptic active zones. The initial results obtained with these models are consistent with experimental data. However, this should be regarded as a work in progress. Further refinements will be implemented, including simulations using morphologically realistic geometries which were generated from confocal scans of the neuromuscular junction using NeuRA (a Neuron Reconstruction Algorithm). Other parameters such as glutamate diffusion and reuptake dynamics, as well as postsynaptic receptor kinetics will be incorporated as well.
Poster presentation: Introduction Dopaminergic neurons in the midbrain show a variety of firing patterns, ranging from very regular firing pacemaker cells to bursty and irregular neurons. The effects of different experimental conditions (like pharmacological treatment or genetical manipulations) on these neuronal discharge patterns may be subtle. Applying a stochastic model is a quantitative approach to reveal these changes. ...
Poster presentation: Introduction The brain is a highly interconnected network of constantly interacting units. Understanding the collective behavior of these units requires a multi-dimensional approach. The results of such analyses are hard to visualize and interpret. Hence tools capable of dealing with such tasks become imperative. ....
The pathogenesis of nodular lymphocyte–predominant Hodgkin lymphoma (NLPHL) and its relationship to other lymphomas are largely unknown. This is partly because of the technical challenge of analyzing its rare neoplastic lymphocytic and histiocytic (L&H) cells, which are dispersed in an abundant nonneoplastic cellular microenvironment. We performed a genome-wide expression study of microdissected L&H lymphoma cells in comparison to normal and other malignant B cells that indicated a relationship of L&H cells to and/or that they originate from germinal center B cells at the transition to memory B cells. L&H cells show a surprisingly high similarity to the tumor cells of T cell–rich B cell lymphoma and classical Hodgkin lymphoma, a partial loss of their B cell phenotype, and deregulation of many apoptosis regulators and putative oncogenes. Importantly, L&H cells are characterized by constitutive nuclear factor {kappa}B activity and aberrant extracellular signal-regulated kinase signaling. Thus, these findings shed new light on the nature of L&H cells, reveal several novel pathogenetic mechanisms in NLPHL, and may help in differential diagnosis and lead to novel therapeutic strategies.
Background Although current molecular clock methods offer greater flexibility in modelling historical evolutionary events, calibration of the clock with dates from the fossil record is still problematic for many groups. Here we implement several new approaches in molecular dating to estimate evolutionary ages of Lacertidae, an Old World family of lizards with a poor fossil record and uncertain phylogeny. Four different models of rate variation are tested in a new program for Bayesian phylogenetic analysis called TreeTime, based on a combination of mitochondrial and nuclear gene sequences. We incorporate paleontological uncertainty into divergence estimates by expressing multiple calibration dates as a range of probabilistic distributions. We also test the reliability of our proposed calibrations by exploring effects of individual priors on posterior estimates. Results According to the most reliable model, as indicated by Bayes factor comparison, modern lacertids arose shortly after the K/T transition and entered Africa about 45 million years ago, with the majority of their African radiation occurring in the Eocene and Oligocene. Our findings indicate much earlier origins for these clades than previously reported, and we discuss our results in light of paleogeographic trends during the Cenozoic. Conclusions This study represents the first attempt to estimate evolutionary ages of a specific group of reptiles exhibiting uncertain phylogenetic relationships, molecular rate variation and a poor fossil record. Our results emphasize the sensitivity of molecular divergence dates to fossil calibrations, and support the use of combined molecular data sets and multiple, well-spaced dates from the fossil record as minimum node constraints. The bioinformatics program used here, TreeTime, is publicly available, and we recommend its use for molecular dating of taxa faced with similar challenges.
Exported proteases of Helicobacter pylori (H. pylori) are potentially involved in pathogen-associated disorders leading to gastric inflammation and neoplasia. By comprehensive sequence screening of the H. pylori proteome for predicted secreted proteases, we retrieved several candidate genes. We detected caseinolytic activities of several such proteases, which are released independently from the H. pylori type IV secretion system encoded by the cag pathogenicity island (cagPAI). Among these, we found the predicted serine protease HtrA (Hp1019), which was previously identified in the bacterial secretome of H. pylori. Importantly, we further found that the H. pylori genes hp1018 and hp1019 represent a single gene likely coding for an exported protein. Here, we directly verified proteolytic activity of HtrA in vitro and identified the HtrA protease in zymograms by mass spectrometry. Overexpressed and purified HtrA exhibited pronounced proteolytic activity, which is inactivated after mutation of Ser205 to alanine in the predicted active center of HtrA. These data demonstrate that H. pylori secretes HtrA as an active protease, which might represent a novel candidate target for therapeutic intervention strategies.
We investigate unary regular languages and compare deterministic finite automata (DFA’s), nondeterministic finite automata (NFA’s) and probabilistic finite automata (PFA’s) with respect to their size. Given a unary PFA with n states and an e-isolated cutpoint, we show that the minimal equivalent DFA has at most n exp 1/2e states in its cycle. This result is almost optimal, since for any alpha < 1 a family of PFA’s can be constructed such that every equivalent DFA has at least n exp alpha/2e states. Thus we show that for the model of probabilistic automata with a constant error bound, there is only a polynomial blowup for cyclic languages. Given a unary NFA with n states, we show that efficiently approximating the size of a minimal equivalent NFA within the factor sqrt(n)/ln n is impossible unless P = NP. This result even holds under the promise that the accepted language is cyclic. On the other hand we show that we can approximate a minimal NFA within the factor ln n, if we are given a cyclic unary n-state DFA.
We present a biologically-inspired system for real-time, feed-forward object recognition in cluttered scenes. Our system utilizes a vocabulary of very sparse features that are shared between and within different object models. To detect objects in a novel scene, these features are located in the image, and each detected feature votes for all objects that are consistent with its presence. Due to the sharing of features between object models our approach is more scalable to large object databases than traditional methods. To demonstrate the utility of this approach, we train our system to recognize any of 50 objects in everyday cluttered scenes with substantial occlusion. Without further optimization we also demonstrate near-perfect recognition on a standard 3-D recognition problem. Our system has an interpretation as a sparsely connected feed-forward neural network, making it a viable model for fast, feed-forward object recognition in the primate visual system.
A new approach to optimize multilevel logic circuits is introduced. Given a multilevel circuit, the synthesis method optimizes its area while simultaneously enhancing its random pattern testability. The method is based on structural transformations at the gate level. New transformations involving EX-OR gates as well as Reed–Muller expansions have been introduced in the synthesis of multilevel circuits. This method is augmented with transformations that specifically enhance random-pattern testability while reducing the area. Testability enhancement is an integral part of our synthesis methodology. Experimental results show that the proposed methodology not only can achieve lower area than other similar tools, but that it achieves better testability compared to available testability enhancement tools such as tstfx. Specifically for ISCAS-85 benchmark circuits, it was observed that EX-OR gate-based transformations successfully contributed toward generating smaller circuits compared to other state-of-the-art logic optimization tools.
Channel routing is an NP-complete problem. Therefore, it is likely that there is no efficient algorithm solving this problem exactly.In this paper, we show that channel routing is a fixed-parameter tractable problem and that we can find a solution in linear time for a fixed channel width.We implemented our approach for the restricted layer model. The algorithm finds an optimal route for channels with up to 13 tracks within minutes or up to 11 tracks within seconds.Such narrow channels occur for example as a leaf problem of hierarchical routers or within standard cell generators.
We present a theoretical analysis of structural FSM traversal, which is the basis for the sequential equivalence checking algorithm Record & Play presented earlier. We compare the convergence behaviour of exact and approximative structural FSM traversal with that of standard BDD-based FSM traversal. We show that for most circuits encountered in practice exact structural FSM traversal reaches the fixed point as fast as symbolic FSM traversal, while approximation can significantly reduce in the number of iterations needed. Our experiments confirm these results.
We present the FPGA implementation of an algorithm [4] that computes implications between signal values in a boolean network. The research was performed as a masterrsquos thesis [5] at the University of Frankfurt. The recursive algorithm is rather complex for a hardware realization and therefore the FPGA implementation is an interesting example for the potential of reconfigurable computing beyond systolic algorithms. A circuit generator was written that transforms a boolean network into a network of small processing elements and a global control logic which together implement the algorithm. The resulting circuit performs the computation two orders of magnitudes faster than a software implementation run by a conventional workstation.
One of the most severe short-comings of currently available equivalence checkers is their inability to verify integer multipliers. In this paper, we present a bit level reverse-engineering technique that can be integrated into standard equivalence checking flows. We propose a Boolean mapping algorithm that extracts a network of half adders from the gate netlist of an addition circuit. Once the arithmetic bit level representation of the circuit is obtained, equivalence checking can be performed using simple arithmetic operations. Experimental results show the promise of our approach.
Considered are the classes QL (quasilinear) and NQL (nondet quasllmear) of all those problems that can be solved by deterministic (nondetermlnlsttc, respectively) Turmg machines in time O(n(log n) ~) for some k Effloent algorithms have time bounds of th~s type, it is argued. Many of the "exhausUve search" type problems such as satlsflablhty and colorabdlty are complete in NQL with respect to reductions that take O(n(log n) k) steps This lmphes that QL = NQL iff satisfiabdlty is m QL CR CATEGORIES: 5.25
In this paper we present a non-deterministic call-by-need (untyped) lambda calculus lambda nd with a constant choice and a let-syntax that models sharing. Our main result is that lambda nd has the nice operational properties of the standard lambda calculus: confluence on sets of expressions, and normal order reduction is sufficient to reach head normal form. Using a strong contextual equivalence we show correctness of several program transformations. In particular of lambdalifting using deterministic maximal free expressions. These results show that lambda nd is a new and also natural combination of non-determinism and lambda-calculus, which has a lot of opportunities for parallel evaluation. An intended application of lambda nd is as a foundation for compiling lazy functional programming languages with I/O based on direct calls. The set of correct program transformations can be rigorously distinguished from non-correct ones. All program transformations are permitted with the slight exception that for transformations like common subexpression elimination and lambda-lifting with maximal free expressions the involved subexpressions have to be deterministic ones.