Refine
Year of publication
Document Type
- Article (450) (remove)
Language
- English (450) (remove)
Has Fulltext
- yes (450)
Is part of the Bibliography
- no (450)
Keywords
- Heavy Ion Experiments (20)
- Hadron-Hadron Scattering (14)
- Hadron-Hadron scattering (experiments) (10)
- LHC (9)
- Heavy-ion collision (7)
- Jets (5)
- Collective Flow (4)
- Heavy Quark Production (4)
- Petri net (4)
- Quark-Gluon Plasma (4)
Institute
- Informatik (450) (remove)
Learning to solve graph tasks is one of the key prerequisites of acquiring domain-specific knowledge in most study domains. Analyses of graph understanding often use eye-tracking and focus on analyzing how much time students spend gazing at particular areas of a graph—Areas of Interest (AOIs). To gain a deeper insight into students’ task-solving process, we argue that the gaze shifts between students’ fixations on different AOIs (so-termed transitions) also need to be included in holistic analyses of graph understanding that consider the importance of transitions for the task-solving process. Thus, we introduced Epistemic Network Analysis (ENA) as a novel approach to analyze eye-tracking data of 23 university students who solved eight multiple-choice graph tasks in physics and economics. ENA is a method for quantifying, visualizing, and interpreting network data allowing a weighted analysis of the gaze patterns of both correct and incorrect graph task solvers considering the interrelations between fixations and transitions. After an analysis of the differences in the number of fixations and the number of single transitions between correct and incorrect solvers, we conducted an ENA for each task. We demonstrate that an isolated analysis of fixations and transitions provides only a limited insight into graph solving behavior. In contrast, ENA identifies differences between the gaze patterns of students who solved the graph tasks correctly and incorrectly across the multiple graph tasks. For instance, incorrect solvers shifted their gaze from the graph to the x-axis and from the question to the graph comparatively more often than correct solvers. The results indicate that incorrect solvers often have problems transferring textual information into graphical information and rely more on partly irrelevant parts of a graph. Finally, we discuss how the findings can be used to design experimental studies and for innovative instructional procedures in higher education
Volatility clustering and fat tails are prominently observed in financial markets. Here, we analyze the underlying mechanisms of three agent-based models explaining these stylized facts in terms of market instabilities and compare them on empirical grounds. To this end, we first develop a general framework for detecting tail events in stock markets. In particular, we introduce Hawkes processes to automatically identify and date onsets of market turmoils which result in increased volatility. Second, we introduce three different indicators to predict those onsets. Each of the three indicators is derived from and tailored to one of the models, namely quantifying information content, critical slowing down or market risk perception. Finally, we apply our indicators to simulated and real market data. We find that all indicators reliably predict market events on simulated data and clearly distinguish the different models. In contrast, a systematic comparison on the stocks of the Forbes 500 companies shows a markedly lower performance. Overall, predicting the onset of market turmoils appears difficult, yet, over very short time horizons high or rising volatility exhibits some predictive power.
Monitoring is an indispensable tool for the operation of any large installation of grid or cluster computing, be it high energy physics or elsewhere. Usually, monitoring is configured to collect a small amount of data, just enough to enable detection of abnormal conditions. Once detected, the abnormal condition is handled by gathering all information from the affected components. This data is processed by querying it in a manner similar to a database.
This contribution shows how the metaphor of a debugger (for software applications) can be transferred to a compute cluster. The concepts of variables, assertions and breakpoints that are used in debugging can be applied to monitoring by defining variables as the quantities recorded by monitoring and breakpoints as invariants formulated via these variables. It is found that embedding fragments of a data extracting and reporting tool such as the UNIX tool awk facilitates concise notations for commonly used variables since tools like awk are designed to process large event streams (in textual representations) with bounded memory. A functional notation similar to both the pipe notation used in the UNIX shell and the point-free style used in functional programming simplify the combination of variables that commonly occur when formulating breakpoints.
We test the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted. Controlling the size factor, we investigate this hypothesis for a number of 25 subject areas. Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias. The article therefore deals with the question of whether Wikipedia exhibits this kind of linguistic relativity or not. From the perspective of educational science, the article develops a computational model of the information landscape from which multiple texts are drawn as typical input of web-based reading. For this purpose, it develops a hybrid model of intra- and intertextual similarity of different parts of the information landscape and tests this model on the example of 35 languages and corresponding Wikipedias. In the way it measures the similarities of hypertexts, the article goes beyond existing approaches by examining their structural and semantic aspects intra- and intertextually. In this way it builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.
A new method of event characterization based on Deep Learning is presented. The PointNet models can be used for fast, online event-by-event impact parameter determination at the CBM experiment. For this study, UrQMD and the CBM detector simulation are used to generate Au+Au collision events at 10 AGeV which are then used to train and evaluate PointNet based architectures. The models can be trained on features like the hit position of particles in the CBM detector planes, tracks reconstructed from the hits or combinations thereof. The Deep Learning models reconstruct impact parameters from 2-14 fm with a mean error varying from -0.33 to 0.22 fm. For impact parameters in the range of 5-14 fm, a model which uses the combination of hit and track information of particles has a relative precision of 4-9% and a mean error of -0.33 to 0.13 fm. In the same range of impact parameters, a model with only track information has a relative precision of 4-10% and a mean error of -0.18 to 0.22 fm. This new method of event-classification is shown to be more accurate and less model dependent than conventional methods and can utilize the performance boost of modern GPU processor units.
The impact of columnar file formats on SQL‐on‐hadoop engine performance: a study on ORC and Parquet
(2019)
Columnar file formats provide an efficient way to store data to be queried by SQL‐on‐Hadoop engines. Related works consider the performance of processing engine and file format together, which makes it impossible to predict their individual impact. In this work, we propose an alternative approach: by executing each file format on the same processing engine, we compare the different file formats as well as their different parameter settings. We apply our strategy to two processing engines, Hive and SparkSQL, and evaluate the performance of two columnar file formats, ORC and Parquet. We use BigBench (TPCx‐BB), a standardized application‐level benchmark for Big Data scenarios. Our experiments confirm that the file format selection and its configuration significantly affect the overall performance. We show that ORC generally performs better on Hive, whereas Parquet achieves best performance with SparkSQL. Using ZLIB compression brings up to 60.2% improvement with ORC, while Parquet achieves up to 7% improvement with Snappy. Exceptions are the queries involving text processing, which do not benefit from using any compression.
The specific temporal evolution of bacterial and phage population sizes, in particular bacterial depletion and the emergence of a resistant bacterial population, can be seen as a kinetic fingerprint that depends on the manifold interactions of the specific phage–host pair during the course of infection. We have elaborated such a kinetic fingerprint for a human urinary tract Klebsiella pneumoniae isolate and its phage vB_KpnP_Lessing by a modeling approach based on data from in vitro co-culture. We found a faster depletion of the initially sensitive bacterial population than expected from simple mass action kinetics. A possible explanation for the rapid decline of the bacterial population is a synergistic interaction of phages which can be a favorable feature for phage therapies. In addition to this interaction characteristic, analysis of the kinetic fingerprint of this bacteria and phage combination revealed several relevant aspects of their population dynamics: A reduction of the bacterial concentration can be achieved only at high multiplicity of infection whereas bacterial extinction is hardly accomplished. Furthermore the binding affinity of the phage to bacteria is identified as one of the most crucial parameters for the reduction of the bacterial population size. Thus, kinetic fingerprinting can be used to infer phage–host interactions and to explore emergent dynamics which facilitates a rational design of phage therapies.
BIOfid is a specialized information service currently being developed to mobilize biodiversity data dormant in printed historical and modern literature and to offer a platform for open access journals on the science of biodiversity. Our team of librarians, computer scientists and biologists produce high-quality text digitizations, develop new text-mining tools and generate detailed ontologies enabling semantic text analysis and semantic search by means of user-specific queries. In a pilot project we focus on German publications on the distribution and ecology of vascular plants, birds, moths and butterflies extending back to the Linnaeus period about 250 years ago. The three organism groups have been selected according to current demands of the relevant research community in Germany. The text corpus defined for this purpose comprises over 400 volumes with more than 100,000 pages to be digitized and will be complemented by journals from other digitization projects, copyright-free and project-related literature. With TextImager (Natural Language Processing & Text Visualization) and TextAnnotator (Discourse Semantic Annotation) we have already extended and launched tools that focus on the text-analytical section of our project. Furthermore, taxonomic and anatomical ontologies elaborated by us for the taxa prioritized by the project’s target group - German institutions and scientists active in biodiversity research - are constantly improved and expanded to maximize scientific data output. Our poster describes the general workflow of our project ranging from literature acquisition via software development, to data availability on the BIOfid web portal (http://biofid.de/), and the implementation into existing platforms which serve to promote global accessibility of biodiversity data.
Risk evaluations for agricultural chemicals are necessary to preserve healthy populations of honey bee colonies. Field studies on whole colonies are limited in behavioural research, while results from lab studies allow only restricted conclusions on whole colony impacts. Methods for automated long-term investigations of behaviours within comb cells, such as brood care, were hitherto missing. In the present study, we demonstrate an innovative video method that enables within-cell analysis in honey bee (Apis mellifera) observation hives to detect chronic sublethal neonicotinoid effects of clothianidin (1 and 10 ppb) and thiacloprid (200 ppb) on worker behaviour and development. In May and June, colonies which were fed 10 ppb clothianidin and 200 ppb thiacloprid in syrup over three weeks showed reduced feeding visits and duration throughout various larval development days (LDDs). On LDD 6 (capping day) total feeding duration did not differ between treatments. Behavioural adaptation was exhibited by nurses in the treatment groups in response to retarded larval development by increasing the overall feeding timespan. Using our machine learning algorithm, we demonstrate a novel method for detecting behaviours in an intact hive that can be applied in a versatile manner to conduct impact analyses of chemicals, pests and other stressors.
Measurement of ϒ(1S) elliptic flow at forward rapidity in Pb-Pb collisions at √sNN = 5.02 TeV
(2019)
The first measurement of the ϒ(1S) elliptic flow coefficient (v2) is performed at forward rapidity (2.5 < y < 4) in Pb–Pb collisions at √sNN = 5.02 TeV with the ALICE detector at the LHC. The results are obtained with the scalar product method and are reported as a function of transverse momentum (pT) up to 15 GeV/c in the 5%–60% centrality interval. The measured Υ(1S)v2 is consistent with 0 and with the small positive values predicted by transport models within uncertainties. The v2 coefficient in 2 < pT < 15 GeV/c is lower than that of inclusive J/ψ mesons in the same pT interval by 2.6 standard deviations. These results, combined with earlier suppression measurements, are in agreement with a scenario in which the Υ(1S) production in Pb–Pb collisions at LHC energies is dominated by dissociation limited to the early stage of the collision, whereas in the J/ψ case there is substantial experimental evidence of an additional regeneration component.
Dancing is an activity that positively enhances the mood of people that consists of feeling the music and expressing it in rhythmic movements with the body. Learning how to dance can be challenging because it requires proper coordination and understanding of rhythm and beat. In this paper, we present the first implementation of the Dancing Coach (DC), a generic system designed to support the practice of dancing steps, which in its current state supports the practice of basic salsa dancing steps. However, the DC has been designed to allow the addition of more dance styles. We also present the first user evaluation of the DC, which consists of user tests with 25 participants. Results from the user test show that participants stated they had learned the basic salsa dancing steps, to move to the beat and body coordination in a fun way. Results also point out some direction on how to improve the future versions of the DC.
Iconographic representations on ancient artifacts are described in many existing databases and literature as human readable text. We applied Natural Language Processing (NLP) approaches in order to extract the semantics out of these textual descriptions and in this way enable semantic searches over them. This allows more sophisticated requests compared to the common existing keyword searches. As we show in our experiments based on numismatic datasets, the approach is generic in the sense that once the system is trained on one dataset, it can be applied without any further manual work also to datasets that have similar content. Of course, additional adaptions would further improve the results. Since the approach requires manual work only during the training phase, it can easily be applied to huge datasets without manual work and therefore without major extra costs. In fact, in our experience bigger datasets generate even better results because there is more data for training. Since our approach is not bound to a certain domain and the numismatic datasets are just an example, it could serve as a blueprint for many other areas. It could also help to build bridges between disciplines since textual iconographic descriptions are to be found also for pottery, sculpture and elsewhere.
Correction to: Scientifc Reports https://doi.org/10.1038/s41598-019-43857-5, published online 17 May 2019. In the original version of this Article, Jan-Hendrik Trösemeier was incorrectly affiliated with ‘Division of Allergology, Paul Ehrlich Institut, Langen, Germany’. Te correct afliations are listed below...
Though the range of invariance in recognition of novel objects is a basic aspect of human vision, its characterization has remained surprisingly elusive. Here we report tolerance to scale and position changes in one-shot learning by measuring recognition accuracy of Korean letters presented in a flash to non-Korean subjects who had no previous experience with Korean letters. We found that humans have significant scale-invariance after only a single exposure to a novel object. The range of translation-invariance is limited, depending on the size and position of presented objects. To understand the underlying brain computation associated with the invariance properties, we compared experimental data with computational modeling results. Our results suggest that to explain invariant recognition of objects by humans, neural network models should explicitly incorporate built-in scale-invariance, by encoding different scale channels as well as eccentricity-dependent representations captured by neurons’ receptive field sizes and sampling density that change with eccentricity. Our psychophysical experiments and related simulations strongly suggest that the human visual system uses a computational strategy that differs in some key aspects from current deep learning architectures, being more data efficient and relying more critically on eye-movements.
In pathology, tissue images are evaluated using a light microscope, relying on the expertise and experience of pathologists. There is a great need for computational methods to quantify and standardize histological observations. Computational quantification methods become more and more essential to evaluate tissue images. In particular, the distribution of tumor cells and their microenvironment are of special interest. Here, we systematically investigated tumor cell properties and their spatial neighborhood relations by a new application of statistical analysis to whole slide images of Hodgkin lymphoma, a tumor arising in lymph nodes, and inflammation of lymph nodes called lymphadenitis. We considered properties of more than 400, 000 immunohistochemically stained, CD30-positive cells in 35 whole slide images of tissue sections from subtypes of the classical Hodgkin lymphoma, nodular sclerosis and mixed cellularity, as well as from lymphadenitis. We found that cells of specific morphology exhibited significant favored and unfavored spatial neighborhood relations of cells in dependence of their morphology. This information is important to evaluate differences between Hodgkin lymph nodes infiltrated by tumor cells (Hodgkin lymphoma) and inflamed lymph nodes, concerning the neighborhood relations of cells and the sizes of cells. The quantification of neighborhood relations revealed new insights of relations of CD30-positive cells in different diagnosis cases. The approach is general and can easily be applied to whole slide image analysis of other tumor types.
The morphology of presynaptic specializations can vary greatly ranging from classical single-release-site boutons in the central nervous system to boutons of various sizes harboring multiple vesicle release sites. Multi-release-site boutons can be found in several neural contexts, for example at the neuromuscular junction (NMJ) of body wall muscles of Drosophila larvae. These NMJs are built by two motor neurons forming two types of glutamatergic multi-release-site boutons with two typical diameters. However, it is unknown why these distinct nerve terminal configurations are used on the same postsynaptic muscle fiber. To systematically dissect the biophysical properties of these boutons we developed a full three-dimensional model of such boutons, their release sites and transmitter-harboring vesicles and analyzed the local vesicle dynamics of various configurations during stimulation. Here we show that the rate of transmission of a bouton is primarily limited by diffusion-based vesicle movements and that the probability of vesicle release and the size of a bouton affect bouton-performance in distinct temporal domains allowing for an optimal transmission of the neural signals at different time scales. A comparison of our in silico simulations with in vivo recordings of the natural motor pattern of both neurons revealed that the bouton properties resemble a well-tuned cooperation of the parameters release probability and bouton size, enabling a reliable transmission of the prevailing firing-pattern at diffusion-limited boutons. Our findings indicate that the prevailing firing-pattern of a neuron may determine the physiological and morphological parameters required for its synaptic terminals.
Relying on the theory of Saward (2010) and Disch (2015), we study political representation through the lens of representative claim-making. We identify a gap between the theoretical concept of claim-making and the empirical (quantitative) assessment of representative claims made in the real world’s representative contexts. Therefore, we develop a new approach to map and quantify representative claims in order to subsequently measure the reception and validation of the claims by the audience. To test our method, we analyse all the debates of the German parliament concerned with the introduction of the gender quota in German supervisory boards from 2013 to 2017 in a two-step process. At first, we assess which constituencies the MPs claim to represent and how they justify their stance. Drawing on multiple correspondence analysis, we identify different claim patterns. Second, making use of natural language processing techniques and logistic regression on social media data, we measure if and how the asserted claims in the parliamentary debates are received and validated by the respective audience. We come to the conclusion that the constituency as ultimate judge of legitimacy has not been comprehensively conceptualized yet.
The formulation of the Partial Information Decomposition (PID) framework by Williams and Beer in 2010 attracted a significant amount of attention to the problem of defining redundant (or shared), unique and synergistic (or complementary) components of mutual information that a set of source variables provides about a target. This attention resulted in a number of measures proposed to capture these concepts, theoretical investigations into such measures, and applications to empirical data (in particular to datasets from neuroscience). In this Special Issue on “Information Decomposition of Target Effects from Multi-Source Interactions” at Entropy, we have gathered current work on such information decomposition approaches from many of the leading research groups in the field. We begin our editorial by providing the reader with a review of previous information decomposition research, including an overview of the variety of measures proposed, how they have been interpreted and applied to empirical investigations. We then introduce the articles included in the special issue one by one, providing a similar categorisation of these articles into: i. proposals of new measures; ii. theoretical investigations into properties and interpretations of such approaches, and iii. applications of these measures in empirical studies. We finish by providing an outlook on the future of the field.
This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as a model for low resource languages in general. Cuneiform texts are invaluable sources for the study of history, languages, economy, and cultures of Ancient Mesopotamia and its surrounding regions. Assyriology, the discipline dedicated to their study, has vast research potential, but lacks the modern means for computational processing and analysis. Our project, Machine Translation and Automated Analysis of Cuneiform Languages, aims to fill this gap by bringing together corpus data, lexical data, linguistic annotations and object metadata. The project’s main goal is to build a pipeline for machine translation and annotation of Sumerian Ur III administrative texts. The rich and structured data is then to be made accessible in the form of (Linguistic) Linked Open Data (LLOD), which should open them to a larger research community. Our contribution is two-fold: in terms of language technology, our work represents the first attempt to develop an integrative infrastructure for the annotation of morphology and syntax on the basis of RDF technologies and LLOD resources. With respect to Assyriology, we work towards producing the first syntactically annotated corpus of Sumerian.
Anaplastic large cell lymphoma (ALCL) and classical Hodgkin lymphoma (cHL) are lymphomas that contain CD30-expressing tumor cells and have numerous pathological similarities. Whereas ALCL is usually diagnosed at an advanced stage, cHL more frequently presents with localized disease. The aim of the present study was to elucidate the mechanisms underlying the different clinical presentation of ALCL and cHL. Chemokine and chemokine receptor expression were similar in primary ALCL and cHL cases apart from the known overexpression of the chemokines CCL17 and CCL22 in the Hodgkin and Reed-Sternberg (HRS) cells of cHL. Consistent with the overexpression of these chemokines, primary cHL cases encountered a significantly denser T cell microenvironment than ALCL. Additionally to differences in the interaction with their microenvironment, cHL cell lines presented a lower and less efficient intrinsic cell motility than ALCL cell lines, as assessed by time-lapse microscopy in a collagen gel and transwell migration assays. We thus propose that the combination of impaired basal cell motility and differences in the interaction with the microenvironment hamper the dissemination of HRS cells in cHL when compared with the tumor cells of ALCL.
The development of multimodal sensor-based applications designed to support learners with the improvement of their skills is expensive since most of these applications are tailor-made and built from scratch. In this paper, we show how the Presentation Trainer (PT), a multimodal sensor-based application designed to support the development of public speaking skills, can be modularly extended with a Virtual Reality real-time feedback module (VR module), which makes usage of the PT more immersive and comprehensive. The described study consists of a formative evaluation and has two main objectives. Firstly, a technical objective is concerned with the feasibility of extending the PT with an immersive VR Module. Secondly, a user experience objective focuses on the level of satisfaction of interacting with the VR extended PT. To study these objectives, we conducted user tests with 20 participants. Results from our test show the feasibility of modularly extending existing multimodal sensor-based applications, and in terms of learning and user experience, results indicate a positive attitude of the participants towards using the application (PT+VR module).
Background: Microarray analysis represents a powerful way to test scientific hypotheses on the functionality of cells. The measurements consider the whole genome, and the large number of generated data requires sophisticated analysis. To date, no gold-standard for the analysis of microarray images has been established. Due to the lack of a standard approach there is a strong need to identify new processing algorithms.
Methods: We propose a novel approach based on hyperbolic partial differential equations (PDEs) for unsupervised spot segmentation. Prior to segmentation, morphological operations were applied for the identification of co-localized groups of spots. A grid alignment was performed to determine the borderlines between rows and columns of spots. PDEs were applied to detect the inflection points within each column and row; vertical and horizontal luminance profiles were evolved respectively. The inflection points of the profiles determined borderlines that confined a spot within adapted rectangular areas. A subsequent k-means clustering determined the pixels of each individual spot and its local background.
Results: We evaluated the approach for a data set of microarray images taken from the Stanford Microarray Database (SMD). The data set is based on two studies on global gene expression profiles of Arabidopsis Thaliana. We computed values for spot intensity, regression ratio, and coefficient of determination. For spots with irregular contours and inner holes, we found intensity values that were significantly different from those determined by the GenePix Pro microarray analysis software. We determined the set of differentially expressed genes from our intensities and identified more activated genes than were predicted by the GenePix software.
Conclusions: Our method represents a worthwhile alternative and complement to standard approaches used in industry and academy. We highlight the importance of our spot segmentation approach, which identified supplementary important genes, to better explains the molecular mechanisms that are activated in a defense responses to virus and pathogen infection.
The production of K∗(892)0 and ϕ(1020) mesons has been measured in p–Pb collisions at √sNN = 5.02 TeV. K∗0 and ϕ are reconstructed via their decay into charged hadrons with the ALICE detector in the rapidity range - 0.5 < y < 0. The transverse momentum spectra, measured as a function of the multiplicity, have a pT range from 0 to 15 GeV/c for K∗0 and from 0.3 to 21 GeV/c for ϕ. Integrated yields, mean transverse momenta and particle ratios are reported and compared with results in pp collisions at √s = 7 TeV and Pb–Pb collisions at √sNN = 2.76 TeV. In Pb–Pb and p–Pb collisions, K∗0 and ϕ probe the hadronic phase of the system and contribute to the study of particle formation mechanisms by comparison with other identified hadrons. For this purpose, the mean transverse momenta and the differential proton-to-ϕ ratio are discussed as a function of the multiplicity of the event. The short-lived K∗0 is measured to investigate re-scattering effects, believed to be related to the size of the system and to the lifetime of the hadronic phase.
Heterologously expressed genes require adaptation to the host organism to ensure adequate levels of protein synthesis, which is typically approached by replacing codons by the target organism’s preferred codons. In view of frequently encountered suboptimal outcomes we introduce the codon-specific elongation model (COSEM) as an alternative concept. COSEM simulates ribosome dynamics during mRNA translation and informs about protein synthesis rates per mRNA in an organism- and context-dependent way. Protein synthesis rates from COSEM are integrated with further relevant covariates such as translation accuracy into a protein expression score that we use for codon optimization. The scoring algorithm further enables fine-tuning of protein expression including deoptimization and is implemented in the software OCTOPOS. The protein expression score produces competitive predictions on proteomic data from prokaryotic, eukaryotic, and human expression systems. In addition, we optimized and tested heterologous expression of manA and ova genes in Salmonella enterica serovar Typhimurium. Superiority over standard methodology was demonstrated by a threefold increase in protein yield compared to wildtype and commercially optimized sequences.
CRFVoter : gene and protein related object recognition using a conglomerate of CRF-based tools
(2019)
Background: Gene and protein related objects are an important class of entities in biomedical research, whose identification and extraction from scientific articles is attracting increasing interest. In this work, we describe an approach to the BioCreative V.5 challenge regarding the recognition and classification of gene and protein related objects. For this purpose, we transform the task as posed by BioCreative V.5 into a sequence labeling problem. We present a series of sequence labeling systems that we used and adapted in our experiments for solving this task. Our experiments show how to optimize the hyperparameters of the classifiers involved. To this end, we utilize various algorithms for hyperparameter optimization. Finally, we present CRFVoter, a two-stage application of Conditional Random Field (CRF) that integrates the optimized sequence labelers from our study into one ensemble classifier.
Results: We analyze the impact of hyperparameter optimization regarding named entity recognition in biomedical research and show that this optimization results in a performance increase of up to 60%. In our evaluation, our ensemble classifier based on multiple sequence labelers, called CRFVoter, outperforms each individual extractor’s performance. For the blinded test set provided by the BioCreative organizers, CRFVoter achieves an F-score of 75%, a recall of 71% and a precision of 80%. For the GPRO type 1 evaluation, CRFVoter achieves an F-Score of 73%, a recall of 70% and achieved the best precision (77%) among all task participants.
Conclusion: CRFVoter is effective when multiple sequence labeling systems are to be used and performs better then the individual systems collected by it.
The hepatitis C virus (HCV) RNA replication cycle is a dynamic intracellular process occurring in three-dimensional space (3D), which is difficult both to capture experimentally and to visualize conceptually. HCV-generated replication factories are housed within virus-induced intracellular structures termed membranous webs (MW), which are derived from the Endoplasmatic Reticulum (ER). Recently, we published 3D spatiotemporal resolved diffusion–reaction models of the HCV RNA replication cycle by means of surface partial differential equation (sPDE) descriptions. We distinguished between the basic components of the HCV RNA replication cycle, namely HCV RNA, non-structural viral proteins (NSPs), and a host factor. In particular, we evaluated the sPDE models upon realistic reconstructed intracellular compartments (ER/MW). In this paper, we propose a significant extension of the model based upon two additional parameters: different aggregate states of HCV RNA and NSPs, and population dynamics inspired diffusion and reaction coefficients instead of multilinear ones. The combination of both aspects enables realistic modeling of viral replication at all scales. Specifically, we describe a replication complex state consisting of HCV RNA together with a defined amount of NSPs. As a result of the combination of spatial resolution and different aggregate states, the new model mimics a cis requirement for HCV RNA replication. We used heuristic parameters for our simulations, which were run only on a subsection of the ER. Nevertheless, this was sufficient to allow the fitting of core aspects of virus reproduction, at least qualitatively. Our findings should help stimulate new model approaches and experimental directions for virology.
LSTMVoter : chemical named entity recognition using a conglomerate of sequence labeling tools
(2019)
Background: Chemical and biomedical named entity recognition (NER) is an essential preprocessing task in natural language processing. The identification and extraction of named entities from scientific articles is also attracting increasing interest in many scientific disciplines. Locating chemical named entities in the literature is an essential step in chemical text mining pipelines for identifying chemical mentions, their properties, and relations as discussed in the literature. In this work, we describe an approach to the BioCreative V.5 challenge regarding the recognition and classification of chemical named entities. For this purpose, we transform the task of NER into a sequence labeling problem. We present a series of sequence labeling systems that we used, adapted and optimized in our experiments for solving this task. To this end, we experiment with hyperparameter optimization. Finally, we present LSTMVoter, a two-stage application of recurrent neural networks that integrates the optimized sequence labelers from our study into a single ensemble classifier.
Results: We introduce LSTMVoter, a bidirectional long short-term memory (LSTM) tagger that utilizes a conditional random field layer in conjunction with attention-based feature modeling. Our approach explores information about features that is modeled by means of an attention mechanism. LSTMVoter outperforms each extractor integrated by it in a series of experiments. On the BioCreative IV chemical compound and drug name recognition (CHEMDNER) corpus, LSTMVoter achieves an F1-score of 90.04%; on the BioCreative V.5 chemical entity mention in patents corpus, it achieves an F1-score of 89.01%.
Availability and implementation: Data and code are available at https://github.com/texttechnologylab/LSTMVoter.
The endoplasmic reticulum–mitochondria encounter structure (ERMES) connects the mitochondrial outer membrane with the ER. Multiple functions have been linked to ERMES, including maintenance of mitochondrial morphology, protein assembly and phospholipid homeostasis. Since the mitochondrial distribution and morphology protein Mdm10 is present in both ERMES and the mitochondrial sorting and assembly machinery (SAM), it is unknown how the ERMES functions are connected on a molecular level. Here we report that conserved surface areas on opposite sides of the Mdm10 β-barrel interact with SAM and ERMES, respectively. We generated point mutants to separate protein assembly (SAM) from morphology and phospholipid homeostasis (ERMES). Our study reveals that the β-barrel channel of Mdm10 serves different functions. Mdm10 promotes the biogenesis of α-helical and β-barrel proteins at SAM and functions as integral membrane anchor of ERMES, demonstrating that SAM-mediated protein assembly is distinct from ER-mitochondria contact sites.
In this paper, we study the limit of compactness which is a graph index originally introduced for measuring structural characteristics of hypermedia. Applying compactness to large scale small-world graphs (Mehler, 2008) observed its limit behaviour to be equal 1. The striking question concerning this finding was whether this limit behaviour resulted from the specifics of small-world graphs or was simply an artefact. In this paper, we determine the necessary and sufficient conditions for any sequence of connected graphs resulting in a limit value of CB = 1 which can be generalized with some consideration for the case of disconnected graph classes (Theorem 3). This result can be applied to many well-known classes of connected graphs. Here, we illustrate it by considering four examples. In fact, our proof-theoretical approach allows for quickly obtaining the limit value of compactness for many graph classes sparing computational costs.
Background: Although mortality after cardiac surgery has significantly decreased in the last decade, patients still experience clinically relevant postoperative complications. Among others, atrial fibrillation (AF) is a common consequence of cardiac surgery, which is associated with prolonged hospitalization and increased mortality.
Methods: We retrospectively analyzed data from patients who underwent coronary artery bypass grafting, valve surgery or a combination of both at the University Hospital Muenster between April 2014 and July 2015. We evaluated the incidence of new onset and intermittent/permanent AF (patients with pre- and postoperative AF). Furthermore, we investigated the impact of postoperative AF on clinical outcomes and evaluated potential risk factors.
Results: In total, 999 patients were included in the analysis. New onset AF occurred in 24.9% of the patients and the incidence of intermittent/permanent AF was 59.5%. Both types of postoperative AF were associated with prolonged ICU length of stay (median increase approx. 2 days) and duration of mechanical ventilation (median increase 1 h). Additionally, new onset AF patients had a higher rate of dialysis and hospital mortality and more positive fluid balance on the day of surgery and postoperative days 1 and 2. In a multiple logistic regression model, advanced age (odds ratio (OR) = 1.448 per decade increase, p < 0.0001), a combination of CABG and valve surgery (OR = 1.711, p = 0.047), higher C-reactive protein (OR = 1.06 per unit increase, p < 0.0001) and creatinine plasma concentration (OR = 1.287 per unit increase, p = 0.032) significantly predicted new onset AF. Higher Horowitz index values were associated with a reduced risk (OR = 0.996 per unit increase, p = 0.012). In a separate model, higher plasma creatinine concentration (OR = 2.125 per unit increase, p = 0.022) was a significant risk factor for intermittent/permanent AF whereas higher plasma phosphate concentration (OR = 0.522 per unit increase, p = 0.003) indicated reduced occurrence of this arrhythmia.
Conclusions: New onset and intermittent/permanent AF are associated with adverse clinical outcomes of elective cardiac surgery patients. Different risk factors implicated in postoperative AF suggest different mechanisms might be involved in its pathogenesis. Customized clinical management protocols seem to be warranted for a higher success rate of prevention and treatment of postoperative AF.
The transverse momentum distributions of the strange and double-strange hyperon resonances (Σ(1385)±,Ξ(1530)0) produced in p–Pb collisions at √sNN = 5.02 TeV were measured in the rapidity range −0.5<yCMS<0 for event classes corresponding to different charged-particle multiplicity densities, ⟨dNch/dηlab⟩. The mean transverse momentum values are presented as a function of ⟨dNch/dηlab⟩, as well as a function of the particle masses and compared with previous results on hyperon production. The integrated yield ratios of excited to ground-state hyperons are constant as a function of ⟨dNch/dηlab⟩. The equivalent ratios to pions exhibit an increase with ⟨dNch/dηlab⟩, depending on their strangeness content.
Motivation: Arabidopsis thaliana is a well-established model system for the analysis of the basic physiological and metabolic pathways of plants. Nevertheless, the system is not yet fully understood, although many mechanisms are described, and information for many processes exists. However, the combination and interpretation of the large amount of biological data remain a big challenge, not only because data sets for metabolic paths are still incomplete. Moreover, they are often inconsistent, because they are coming from different experiments of various scales, regarding, for example, accuracy and/or significance. Here, theoretical modeling is powerful to formulate hypotheses for pathways and the dynamics of the metabolism, even if the biological data are incomplete. To develop reliable mathematical models they have to be proven for consistency. This is still a challenging task because many verification techniques fail already for middle-sized models. Consequently, new methods, like decomposition methods or reduction approaches, are developed to circumvent this problem.
Methods: We present a new semi-quantitative mathematical model of the metabolism of Arabidopsis thaliana. We used the Petri net formalism to express the complex reaction system in a mathematically unique manner. To verify the model for correctness and consistency we applied concepts of network decomposition and network reduction such as transition invariants, common transition pairs, and invariant transition pairs.
Results: We formulated the core metabolism of Arabidopsis thaliana based on recent knowledge from literature, including the Calvin cycle, glycolysis and citric acid cycle, glyoxylate cycle, urea cycle, sucrose synthesis, and the starch metabolism. By applying network decomposition and reduction techniques at steady-state conditions, we suggest a straightforward mathematical modeling process. We demonstrate that potential steady-state pathways exist, which provide the fixed carbon to nearly all parts of the network, especially to the citric acid cycle. There is a close cooperation of important metabolic pathways, e.g., the de novo synthesis of uridine-5-monophosphate, the γ-aminobutyric acid shunt, and the urea cycle. The presented approach extends the established methods for a feasible interpretation of biological network models, in particular of large and complex models.
Background: Signal transduction pathways are important cellular processes to maintain the cell’s integrity. Their imbalance can cause severe pathologies. As signal transduction pathways feature complex regulations, they form intertwined networks. Mathematical models aim to capture their regulatory logic and allow an unbiased analysis of robustness and vulnerability of the signaling network. Pathway detection is yet a challenge for the analysis of signaling networks in the field of systems biology. A rigorous mathematical formalism is lacking to identify all possible signal flows in a network model.
Results: In this paper, we introduce the concept of Manatee invariants for the analysis of signal transduction networks. We present an algorithm for the characterization of the combinatorial diversity of signal flows, e.g., from signal reception to cellular response. We demonstrate the concept for a small model of the TNFR1-mediated NF- κB signaling pathway. Manatee invariants reveal all possible signal flows in the network. Further, we show the application of Manatee invariants for in silico knockout experiments. Here, we illustrate the biological relevance of the concept.
Conclusions: The proposed mathematical framework reveals the entire variety of signal flows in models of signaling systems, including cyclic regulations. Thereby, Manatee invariants allow for the analysis of robustness and vulnerability of signaling networks. The application to further analyses such as for in silico knockout was shown. The new framework of Manatee invariants contributes to an advanced examination of signaling systems.
Exploring biophysical properties of virus-encoded components and their requirement for virus replication is an exciting new area of interdisciplinary virological research. To date, spatial resolution has only rarely been analyzed in computational/biophysical descriptions of virus replication dynamics. However, it is widely acknowledged that intracellular spatial dependence is a crucial component of virus life cycles. The hepatitis C virus-encoded NS5A protein is an endoplasmatic reticulum (ER)-anchored viral protein and an essential component of the virus replication machinery. Therefore, we simulate NS5A dynamics on realistic reconstructed, curved ER surfaces by means of surface partial differential equations (sPDE) upon unstructured grids. We match the in silico NS5A diffusion constant such that the NS5A sPDE simulation data reproduce experimental NS5A fluorescence recovery after photobleaching (FRAP) time series data. This parameter estimation yields the NS5A diffusion constant. Such parameters are needed for spatial models of HCV dynamics, which we are developing in parallel but remain qualitative at this stage. Thus, our present study likely provides the first quantitative biophysical description of the movement of a viral component. Our spatio-temporal resolved ansatz paves new ways for understanding intricate spatial-defined processes central to specfic aspects of virus life cycles.
Data structures and advanced models of computation on big data : report from Dagstuhl seminar 14091
(2014)
This report documents the program and the outcomes of Dagstuhl Seminar 14091 "Data Structures and Advanced Models of Computation on Big Data". In today's computing environment vast amounts of data are processed, exchanged and analyzed. The manner in which information is stored profoundly influences the efficiency of these operations over the data. In spite of the maturity of the field many data structuring problems are still open, while new ones arise due to technological advances.
The seminar covered both recent advances in the "classical" data structuring topics as well as new models of computation adapted to modern architectures, scientific studies that reveal the need for such models, applications where large data sets play a central role, modern computing platforms for very large data, and new data structures for large data in modern architectures.
The extended abstracts included in this report contain both recent state of the art advances and lay the foundation for new directions within data structures research.
Synaptic release sites are characterized by exocytosis-competent synaptic vesicles tightly anchored to the presynaptic active zone (PAZ) whose proteome orchestrates the fast signaling events involved in synaptic vesicle cycle and plasticity. Allocation of the amyloid precursor protein (APP) to the PAZ proteome implicated a functional impact of APP in neuronal communication. In this study, we combined state-of-the-art proteomics, electrophysiology and bioinformatics to address protein abundance and functional changes at the native hippocampal PAZ in young and old APP-KO mice. We evaluated if APP deletion has an impact on the metabolic activity of presynaptic mitochondria. Furthermore, we quantified differences in the phosphorylation status after long-term-potentiation (LTP) induction at the purified native PAZ. We observed an increase in the phosphorylation of the signaling enzyme calmodulin-dependent kinase II (CaMKII) only in old APP-KO mice. During aging APP deletion is accompanied by a severe decrease in metabolic activity and hyperphosphorylation of CaMKII. This attributes an essential functional role to APP at hippocampal PAZ and putative molecular mechanisms underlying the age-dependent impairments in learning and memory in APP-KO mice.
We study Gaifman locality and Hanf locality of an extension of first-order logic with modulo p counting quantifiers (FO+MODp , for short) with arbitrary numerical predicates. We require that the validity of formulas is independent of the particular interpretation of the numerical predicates and refer to such formulas as arb-invariant formulas. This paper gives a detailed picture of locality and non-locality properties of arb-invariant FO+MODp . For example, on the class of all finite structures, for any p 2, arb-invariant FO+MODp is neither Hanf nor Gaifman local with respect to a sublinear locality radius. However, in case that p is an odd prime power, it is weakly Gaifman local with a polylogarithmic locality radius. And when restricting attention to the class of string structures, for odd prime powers p, arb-invariant FO+MODp is both Hanf and Gaifman local with a polylogarithmic locality radius. Our negative results build on examples of order-invariant FO+MODp formulas presented in Niemist ̈o’s PhD thesis. Our positive results make use of the close connection between FO+MODp and Boolean circuits built from NOT-gates and AND-, OR-, and MOD p - gates of arbitrary fan-in.
We consider the isolated spelling error correction problem as a specific subproblem of the more general string-to-string translation problem. In this context, we investigate four general string-to-string transformation models that have been suggested in recent years and apply them within the spelling error correction paradigm. In particular, we investigate how a simple ‘k-best decoding plus dictionary lookup’ strategy performs in this context and find that such an approach can significantly outdo baselines such as edit distance, weighted edit distance, and the noisy channel Brill and Moore model to spelling error correction. We also consider elementary combination techniques for our models such as language model weighted majority voting and center string combination. Finally, we consider real-world OCR post-correction for a dataset sampled from medieval Latin texts.
Research in the field of Digital Humanities, also known as Humanities Computing, has seen a steady increase over the past years. Situated at the intersection of computing science and the humanities, present efforts focus on making resources such as texts, images, musical pieces and other semiotic artifacts digitally available, searchable and analysable. To this end, computational tools enabling textual search, visual analytics, data mining, statistics and natural language processing are harnessed to support the humanities researcher. The processing of large data sets with appropriate software opens up novel and fruitful approaches to questions in the traditional humanities. This report summarizes the Dagstuhl seminar 14301 on “Computational Humanities - bridging the gap between Computer Science and Digital Humanities”.
1998 ACM Subject Classification I.2.7 Natural Language Processing, J.5 Arts and Humanities
We present results on transverse momentum (pT) and rapidity (y) differential production cross sections, mean transverse momentum and mean transverse momentum square of inclusive J/ψ and ψ(2S) at forward rapidity (2.5 < y < 4) as well as ψ(2S)-to-J/ψ cross section ratios. These quantities are measured in pp collisions at center of mass energies s√=5.02 and 13 TeV with the ALICE detector. Both charmonium states are reconstructed in the dimuon decay channel, using the muon spectrometer. A comprehensive comparison to inclusive charmonium cross sections measured at s√=2.76, 7 and 8 TeV is performed. A comparison to non-relativistic quantum chromodynamics and fixed-order next-to-leading logarithm calculations, which describe prompt and non-prompt charmonium production respectively, is also presented. A good description of the data is obtained over the full pT range, provided that both contributions are summed. In particular, it is found that for pT > 15 GeV/c the non-prompt contribution reaches up to 50% of the total charmonium yield.
This is a short summary of a recent survey [FR03] focusing on the observed evidence, that Internet connectivity is positively correlated with spread of democracy at high levels of significance. The results of multivariate correlation analysis and probabilities regression estimate models are based on the combined analysis of mid - 1991’s, to 2001 data series of the Eurostat’s and US Census Bureau, the World Bank, and OECD’s statistical data service which track the growth of information technology and rating of freedom and democracy worldwide.
We present an implementation of an interpreter LRPi for the call-by-need calculus LRP, based on a variant of Sestoft's abstract machine Mark 1, extended with an eager garbage collector. It is used as a tool for exact space usage analyses as a support for our investigations into space improvements of call-by-need calculi.
50 years of amino acid hydrophobicity scales : revisiting the capacity for peptide classification
(2016)
Background: Physicochemical properties are frequently analyzed to characterize protein-sequences of known and unknown function. Especially the hydrophobicity of amino acids is often used for structural prediction or for the detection of membrane associated or embedded β-sheets and α-helices. For this purpose many scales classifying amino acids according to their physicochemical properties have been defined over the past decades. In parallel, several hydrophobicity parameters have been defined for calculation of peptide properties. We analyzed the performance of separating sequence pools using 98 hydrophobicity scales and five different hydrophobicity parameters, namely the overall hydrophobicity, the hydrophobic moment for detection of the α-helical and β-sheet membrane segments, the alternating hydrophobicity and the exact ß-strand score.
Results: Most of the scales are capable of discriminating between transmembrane α-helices and transmembrane β-sheets, but assignment of peptides to pools of soluble peptides of different secondary structures is not achieved at the same quality. The separation capacity as measure of the discrimination between different structural elements is best by using the five different hydrophobicity parameters, but addition of the alternating hydrophobicity does not provide a large benefit. An in silico evolutionary approach shows that scales have limitation in separation capacity with a maximal threshold of 0.6 in general. We observed that scales derived from the evolutionary approach performed best in separating the different peptide pools when values for arginine and tyrosine were largely distinct from the value of glutamate. Finally, the separation of secondary structure pools via hydrophobicity can be supported by specific detectable patterns of four amino acids.
Conclusion: It could be assumed that the quality of separation capacity of a certain scale depends on the spacing of the hydrophobicity value of certain amino acids. Irrespective of the wealth of hydrophobicity scales a scale separating all different kinds of secondary structures or between soluble and transmembrane peptides does not exist reflecting that properties other than hydrophobicity affect secondary structure formation as well. Nevertheless, application of hydrophobicity scales allows distinguishing between peptides with transmembrane α-helices and β-sheets. Furthermore, the overall separation capacity score of 0.6 using different hydrophobicity parameters could be assisted by pattern search on the protein sequence level for specific peptides with a length of four amino acids.
The amyloid precursor protein (APP) was discovered in the 1980s as the precursor protein of the amyloid A4 peptide. The amyloid A4 peptide, also known as A-beta (Aβ), is the main constituent of senile plaques implicated in Alzheimer’s disease (AD). In association with the amyloid deposits, increasing impairments in learning and memory as well as the degeneration of neurons especially in the hippocampus formation are hallmarks of the pathogenesis of AD. Within the last decades much effort has been expended into understanding the pathogenesis of AD. However, little is known about the physiological role of APP within the central nervous system (CNS). Allocating APP to the proteome of the highly dynamic presynaptic active zone (PAZ) identified APP as a novel player within this neuronal communication and signaling network. The analysis of the hippocampal PAZ proteome derived from APP-mutant mice demonstrates that APP is tightly embedded in the underlying protein network. Strikingly, APP deletion accounts for major dysregulation within the PAZ proteome network. Ca2+-homeostasis, neurotransmitter release and mitochondrial function are affected and resemble the outcome during the pathogenesis of AD. The observed changes in protein abundance that occur in the absence of APP as well as in AD suggest that APP is a structural and functional regulator within the hippocampal PAZ proteome. Within this review article, we intend to introduce APP as an important player within the hippocampal PAZ proteome and to outline the impact of APP deletion on individual PAZ proteome subcommunities.
The degradation of cytosol-invading pathogens by autophagy, a process known as xenophagy, is an important mechanism of the innate immune system. Inside the host, Salmonella Typhimurium invades epithelial cells and resides within a specialized intracellular compartment, the Salmonella-containing vacuole. A fraction of these bacteria does not persist inside the vacuole and enters the host cytosol. Salmonella Typhimurium that invades the host cytosol becomes a target of the autophagy machinery for degradation. The xenophagy pathway has recently been discovered, and the exact molecular processes are not entirely characterized. Complete kinetic data for each molecular process is not available, so far. We developed a mathematical model of the xenophagy pathway to investigate this key defense mechanism. In this paper, we present a Petri net model of Salmonella xenophagy in epithelial cells. The model is based on functional information derived from literature data. It comprises the molecular mechanism of galectin-8-dependent and ubiquitin-dependent autophagy, including regulatory processes, like nutrient-dependent regulation of autophagy and TBK1-dependent activation of the autophagy receptor, OPTN. To model the activation of TBK1, we proposed a new mechanism of TBK1 activation, suggesting a spatial and temporal regulation of this process. Using standard Petri net analysis techniques, we found basic functional modules, which describe different pathways of the autophagic capture of Salmonella and reflect the basic dynamics of the system. To verify the model, we performed in silico knockout experiments. We introduced a new concept of knockout analysis to systematically compute and visualize the results, using an in silico knockout matrix. The results of the in silico knockout analyses were consistent with published experimental results and provide a basis for future investigations of the Salmonella xenophagy pathway.
Author Summary
Salmonellae are Gram-negative bacteria, which cause the majority of foodborne diseases worldwide. Serovars of Salmonella cause a broad range of diseases, ranging from diarrhea to typhoid fever in a variety of hosts. In the year 2010, Salmonella Typhi caused 7.6 million foodborne diseases and 52 000 deaths, and Salmonella enterica was responsible for 78.7 million diseases and 59 000 deaths. After invasion of Salmonella into host epithelial cells, a small fraction of Salmonella escapes from a specialized intracellular compartment and replicates inside the host cytosol. Xenophagy is a host defense mechanism to protect the host cell from cytosolic pathogens. Understanding how Salmonella is recognized and targeted for xenophagy is an important subject of current research. To the best of our knowledge, no mathematical model has been presented so far, describing the process of Salmonella Typhimurium xenophagy. Here, we present a manually curated and mathematically verified theoretical model of Salmonella Typhimurium xenophagy in epithelial cells, which is consistent with the current state of knowledge. Our model reproduces literature data and postulates new hypotheses for future investigations.
The future heavy-ion experiment CBM (FAIR/GSI, Darmstadt, Germany) will focus on measurement of very rare probes at interaction rates up to 10 MHz with data flow of up to 1 TB/s. The beam will provide free stream of beam particles without bunch structure. That requires full online event reconstruction and selection not only in space, but also in time, so-called 4D event building and selection.
The FLES (First-Level Event Selection) reconstruction and selection package consists of several modules: track finding, track fitting, short-lived particles finding, event building and event selection. A time-slice is reconstructed in parallel between cores within a same CPU, thus minimizing the communication between CPUs. After all tracks are found and fitted in 4D, they are collected into clusters of tracks originated from common primary vertices, which then are fitted, thus identifying 4D interaction points registered within the time-slice. Secondary tracks are associated with primary vertices according to their estimated production time. After that, short-lived particles are found and the full event building process is finished. The last stage of the FLES package is the selection of events according to the requested trigger signatures.
This paper provides a theoretical assessment of gestures in the context of authoring image-related hypertexts by example of the museum information system WikiNect. To this end, a first implementation of gestural writing based on image schemata is provided (Lakoff in Women, fire, and dangerous things: what categories reveal about the mind. University of Chicago Press, Chicago, 1987). Gestural writing is defined as a sort of coding in which propositions are only expressed by means of gestures. In this respect, it is shown that image schemata allow for bridging between natural language predicates and gestural manifestations. Further, it is demonstrated that gestural writing primarily focuses on the perceptual level of image descriptions (Hollink et al. in Int J Hum Comput Stud 61(5):601–626, 2004). By exploring the metaphorical potential of image schemata, it is finally illustrated how to extend the expressiveness of gestural writing in order to reach the conceptual level of image descriptions. In this context, the paper paves the way for implementing museum information systems like WikiNect as systems of kinetic hypertext authoring based on full-fledged gestural writing.
Viruses rely completely on the hosts' machinery for translation of viral transcripts. However, for most viruses infecting humans, codon usage preferences (CUPrefs) do not match those of the host. Human papillomaviruses (HPVs) are a showcase to tackle this paradox: they present a large genotypic diversity and a broad range of phenotypic presentations, from asymptomatic infections to productive lesions and cancer. By applying phylogenetic inference and dimensionality reduction methods, we demonstrate first that genes in HPVs are poorly adapted to the average human CUPrefs, the only exception being capsid genes in viruses causing productive lesions. Phylogenetic relationships between HPVs explained only a small proportion of CUPrefs variation. Instead, the most important explanatory factor for viral CUPrefs was infection phenotype, as orthologous genes in viruses with similar clinical presentation displayed similar CUPrefs. Moreover, viral genes with similar spatiotemporal expression patterns also showed similar CUPrefs. Our results suggest that CUPrefs in HPVs reflect either variations in the mutation bias or differential selection pressures depending on the clinical presentation and expression timing. We propose that poor viral CUPrefs may be central to a trade-off between strong viral gene expression and the potential for eliciting protective immune response.
We provide elementary algorithms for two preservation theorems for first-order sentences (FO) on the class ℭd of all finite structures of degree at most d: For each FO-sentence that is preserved under extensions (homomorphisms) on ℭd, a ℭd-equivalent existential (existential-positive) FO-sentence can be constructed in 5-fold (4-fold) exponential time. This is complemented by lower bounds showing that a 3-fold exponential blow-up of the computed existential (existential-positive) sentence is unavoidable. Both algorithms can be extended (while maintaining the upper and lower bounds on their time complexity) to input first-order sentences with modulo m counting quantifiers (FO+MODm). Furthermore, we show that for an input FO-formula, a ℭd-equivalent Feferman-Vaught decomposition can be computed in 3-fold exponential time. We also provide a matching lower bound
This paper shows equivalence of several versions of applicative similarity and contextual approximation, and hence also of applicative bisimilarity and contextual equivalence, in LR, the deterministic call-by-need lambda calculus with letrec extended by data constructors, case-expressions and Haskell's seq-operator. LR models an untyped version of the core language of Haskell. The use of bisimilarities simplifies equivalence proofs in calculi and opens a way for more convenient correctness proofs for program transformations. The proof is by a fully abstract and surjective transfer into a call-by-name calculus, which is an extension of Abramsky's lazy lambda calculus. In the latter calculus equivalence of our similarities and contextual approximation can be shown by Howe's method. Similarity is transferred back to LR on the basis of an inductively defined similarity. The translation from the call-by-need letrec calculus into the extended call-by-name lambda calculus is the composition of two translations. The first translation replaces the call-by-need strategy by a call-by-name strategy and its correctness is shown by exploiting infinite trees which emerge by unfolding the letrec expressions. The second translation encodes letrec-expressions by using multi-fixpoint combinators and its correctness is shown syntactically by comparing reductions of both calculi. A further result of this paper is an isomorphism between the mentioned calculi, which is also an identity on letrec-free expressions.