Informatik
Refine
Year of publication
Document Type
- Preprint (755)
- Article (401)
- Working Paper (119)
- Doctoral Thesis (93)
- Diploma Thesis (46)
- Conference Proceeding (41)
- Book (37)
- Bachelor Thesis (36)
- diplomthesis (29)
- Report (25)
- Part of a Book (13)
- Contribution to a Periodical (10)
- Master's Thesis (6)
- Habilitation (1)
- Lecture (1)
- Review (1)
Has Fulltext
- yes (1614)
Is part of the Bibliography
- no (1614)
Keywords
Institute
- Informatik (1614)
- Frankfurt Institute for Advanced Studies (FIAS) (1008)
- Physik (986)
- Mathematik (56)
- Präsidium (41)
- Medizin (25)
- Biowissenschaften (21)
- Exzellenzcluster Makromolekulare Komplexe (8)
- Psychologie (8)
- Deutsches Institut für Internationale Pädagogische Forschung (DIPF) (5)
In this dissertation the formal abstraction and verification of analog circuit is examined. An approach is introduced that automatically abstracts a transistor level circuit with full Spice accuracy into a hybrid automaton (HA) in various output languages. The generated behavioral model exhibits a significant simulation speed-up compared to the original netlist, while maintaining an acceptable accuracy, and can be therefore used in various verification and validation routines. On top of that, the generated models can be formally verified against their Spice netlists, making the obtained models correct by construction.
The generated abstract models can be extended to enclose modeling as well as technology dependent parameter variations with little over approximations. As these models enclose the various behaviors of the sampled netlists, the obtained models are of significant importance as they can replace several simulations with just a single reachability analysis or symbolic simulation. Moreover, these models can be as well be used in different verification routines as demonstrated in this dissertation.
As the obtained models are described by HAs with linear behaviors in the locations, the abstract models can be as well compositionally linked, allowing thereby the abstraction of complex analog circuits.
Depending on the specified modeling settings, including for example the number of locations of the HA and the description of the system behavior, the accuracy, speedup, and various additional properties of the HA can be influenced. This is examined in detail in this dissertation. The underlying abstraction process is first covered in detail. Several extensions are then handled including the modeling of the HAs with parameter variations. The obtained models are then verified using various verification methodologies. The accuracy and speed-up of the abstraction methodology is finally evaluated on several transistor level circuits ranging from simple operational amplifiers up to a complex circuits.
In this talk we presented a novel technique, based on Deep Learning, to determine the impact parameter of nuclear collisions at the CBM experiment. PointNet based Deep Learning models are trained on UrQMD followed by CBMRoot simulations of Au+Au collisions at 10 AGeV to reconstruct the impact parameter of collisions from raw experimental data such as hits of the particles in the detector planes, tracks reconstructed from the hits or their combinations. The PointNet models can perform fast, accurate, event-by-event impact parameter determination in heavy ion collision experiments. They are shown to outperform a simple model which maps the track multiplicity to the impact parameter. While conventional methods for centrality classification merely provide an expected impact parameter distribution for a given centrality class, the PointNet models predict the impact parameter from 2–14 fm on an event-by-event basis with a mean error of −0.33 to 0.22 fm.
The ongoing digitalization of educational resources and the use of the internet lead to a steady increase of potentially available learning media. However, many of the media which are used for educational purposes have not been designed specifically for teaching and learning. Usually, linguistic criteria of readability and comprehensibility as well as content-related criteria are used independently to assess and compare the quality of educational media. This also holds true for educational media used in economics. This article aims to improve the analysis of textual learning media used in economic education by drawing on threshold concepts. Threshold concepts are key terms in knowledge acquisition within a domain. From a linguistic perspective, however, threshold concepts are instances of specialized vocabularies, exhibiting particular linguistic features. In three kinds of (German) resources, namely in textbooks, in newspapers, and on Wikipedia, we investigate the distributive profiles of 63 threshold concepts identified in economics education (which have been collected from threshold concept research). We looked at the threshold concepts' frequency distribution, their compound distribution, and their network structure within the three kinds of resources. The two main findings of our analysis show that firstly, the three kinds of resources can indeed be distinguished in terms of their threshold concepts' profiles. Secondly, Wikipedia definitely shows stronger associative connections between economic threshold concepts than the other sources. We discuss the findings in relation to adequate media use for teaching and learning—not only in economic education.
Learning to solve graph tasks is one of the key prerequisites of acquiring domain-specific knowledge in most study domains. Analyses of graph understanding often use eye-tracking and focus on analyzing how much time students spend gazing at particular areas of a graph—Areas of Interest (AOIs). To gain a deeper insight into students’ task-solving process, we argue that the gaze shifts between students’ fixations on different AOIs (so-termed transitions) also need to be included in holistic analyses of graph understanding that consider the importance of transitions for the task-solving process. Thus, we introduced Epistemic Network Analysis (ENA) as a novel approach to analyze eye-tracking data of 23 university students who solved eight multiple-choice graph tasks in physics and economics. ENA is a method for quantifying, visualizing, and interpreting network data allowing a weighted analysis of the gaze patterns of both correct and incorrect graph task solvers considering the interrelations between fixations and transitions. After an analysis of the differences in the number of fixations and the number of single transitions between correct and incorrect solvers, we conducted an ENA for each task. We demonstrate that an isolated analysis of fixations and transitions provides only a limited insight into graph solving behavior. In contrast, ENA identifies differences between the gaze patterns of students who solved the graph tasks correctly and incorrectly across the multiple graph tasks. For instance, incorrect solvers shifted their gaze from the graph to the x-axis and from the question to the graph comparatively more often than correct solvers. The results indicate that incorrect solvers often have problems transferring textual information into graphical information and rely more on partly irrelevant parts of a graph. Finally, we discuss how the findings can be used to design experimental studies and for innovative instructional procedures in higher education
Volatility clustering and fat tails are prominently observed in financial markets. Here, we analyze the underlying mechanisms of three agent-based models explaining these stylized facts in terms of market instabilities and compare them on empirical grounds. To this end, we first develop a general framework for detecting tail events in stock markets. In particular, we introduce Hawkes processes to automatically identify and date onsets of market turmoils which result in increased volatility. Second, we introduce three different indicators to predict those onsets. Each of the three indicators is derived from and tailored to one of the models, namely quantifying information content, critical slowing down or market risk perception. Finally, we apply our indicators to simulated and real market data. We find that all indicators reliably predict market events on simulated data and clearly distinguish the different models. In contrast, a systematic comparison on the stocks of the Forbes 500 companies shows a markedly lower performance. Overall, predicting the onset of market turmoils appears difficult, yet, over very short time horizons high or rising volatility exhibits some predictive power.
Monitoring is an indispensable tool for the operation of any large installation of grid or cluster computing, be it high energy physics or elsewhere. Usually, monitoring is configured to collect a small amount of data, just enough to enable detection of abnormal conditions. Once detected, the abnormal condition is handled by gathering all information from the affected components. This data is processed by querying it in a manner similar to a database.
This contribution shows how the metaphor of a debugger (for software applications) can be transferred to a compute cluster. The concepts of variables, assertions and breakpoints that are used in debugging can be applied to monitoring by defining variables as the quantities recorded by monitoring and breakpoints as invariants formulated via these variables. It is found that embedding fragments of a data extracting and reporting tool such as the UNIX tool awk facilitates concise notations for commonly used variables since tools like awk are designed to process large event streams (in textual representations) with bounded memory. A functional notation similar to both the pipe notation used in the UNIX shell and the point-free style used in functional programming simplify the combination of variables that commonly occur when formulating breakpoints.
We test the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted. Controlling the size factor, we investigate this hypothesis for a number of 25 subject areas. Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias. The article therefore deals with the question of whether Wikipedia exhibits this kind of linguistic relativity or not. From the perspective of educational science, the article develops a computational model of the information landscape from which multiple texts are drawn as typical input of web-based reading. For this purpose, it develops a hybrid model of intra- and intertextual similarity of different parts of the information landscape and tests this model on the example of 35 languages and corresponding Wikipedias. In the way it measures the similarities of hypertexts, the article goes beyond existing approaches by examining their structural and semantic aspects intra- and intertextually. In this way it builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.
A new method of event characterization based on Deep Learning is presented. The PointNet models can be used for fast, online event-by-event impact parameter determination at the CBM experiment. For this study, UrQMD and the CBM detector simulation are used to generate Au+Au collision events at 10 AGeV which are then used to train and evaluate PointNet based architectures. The models can be trained on features like the hit position of particles in the CBM detector planes, tracks reconstructed from the hits or combinations thereof. The Deep Learning models reconstruct impact parameters from 2-14 fm with a mean error varying from -0.33 to 0.22 fm. For impact parameters in the range of 5-14 fm, a model which uses the combination of hit and track information of particles has a relative precision of 4-9% and a mean error of -0.33 to 0.13 fm. In the same range of impact parameters, a model with only track information has a relative precision of 4-10% and a mean error of -0.18 to 0.22 fm. This new method of event-classification is shown to be more accurate and less model dependent than conventional methods and can utilize the performance boost of modern GPU processor units.
The impact of columnar file formats on SQL‐on‐hadoop engine performance: a study on ORC and Parquet
(2019)
Columnar file formats provide an efficient way to store data to be queried by SQL‐on‐Hadoop engines. Related works consider the performance of processing engine and file format together, which makes it impossible to predict their individual impact. In this work, we propose an alternative approach: by executing each file format on the same processing engine, we compare the different file formats as well as their different parameter settings. We apply our strategy to two processing engines, Hive and SparkSQL, and evaluate the performance of two columnar file formats, ORC and Parquet. We use BigBench (TPCx‐BB), a standardized application‐level benchmark for Big Data scenarios. Our experiments confirm that the file format selection and its configuration significantly affect the overall performance. We show that ORC generally performs better on Hive, whereas Parquet achieves best performance with SparkSQL. Using ZLIB compression brings up to 60.2% improvement with ORC, while Parquet achieves up to 7% improvement with Snappy. Exceptions are the queries involving text processing, which do not benefit from using any compression.
Es sollte eine Simulationsumgebung mit einem Straßennetz und eine KDNA, die Autos auf diesem Straßennetz kontrolliert, implementiert werden. Für die Simulation wurde eine einfache graphische Darstellung entwickelt auf der eine variable Anzahl Autos auf einem vorprogrammierten Straßennetz fahren. Eine KDNA steuert diese Autos über Kontrolle von Gas-, Bremse- und Steuerradpositionen, wobei Geschwindigkeits- und Richtungskontrolle unabhängig stattfinden. Bei der Analyse der KDNA für mehrere Autos traten Leistungsprobleme auf, deren Quelle genauer untersucht wurde. Die Last wurde primär durch die Kommunikation zur Verwaltung der KDNA-Tasks im AHS erzeugt.
Das Ziel dieser Arbeit ist, einen Text automatisch darauf zu untersuchen, ob er Gebäude beschreibt, und diese gegebenenfalls zu visualisieren. Zu diesem Zweck wurde ein Prototyp entwickelt, der mithilfe von NLP-Software auf Basis einer UIMA-Pipeline einen Text auf Gebäudedaten untersucht und diese anschließend als 3D-Modelle auf einer Karte visualisiert. Um die Güte des Projekts zu bestimmen wurde eine Evaluation durchgeführt, in der die Aufgabe darin bestand, Paragraphen ihren zugehörigen 3D-Modellen zuzuordnen. Die Ergebnisse wiesen eine Erkennungsrate von 88.67\% auf. Jedoch wurden auch Schwächen im Standardisierungsverfahren der Parameter und in der einseitigen Art zu Visualisieren aufgezeigt. Zum Schluss wird vorgestellt, wie diese Schwachstellen mithilfe eines ontologischen Modells behoben werden können und wie mit dem Projekt weiterverfahren werden kann.
In diesem Beitrag untersuchen wir Entwicklungstendenzen von Infrastrukturen in den Digitalen Geisteswissenschaften. Wir argumentieren, dass infolge (1) der Verfügbarkeit von immer mehr Daten über sozial-semiotische Netzwerke, (2) der Methodeninflation in geisteswissenschaftlichen Disziplinen, (3) der zunehmend hybriden Arbeitsteilung zwischen Mensch und Maschine und (4) der explosionsartigen Vermehrung künstlicher Texte ein erheblicher Anpassungsdruck auf die Weiterentwicklung solcher Infrastrukturen entstanden ist. In diesem Zusammenhang beschreiben wir drei Informationssysteme, die sich unter anderem durch die Interaktionsmöglichkeiten unterscheiden, die sie ihren Nutzern bieten, um solchen Herausforderungen zu begegnen. Dabei skizzieren wir mit VienNA eine neuartige Architektur solcher Systeme, welche aufgrund ihrer Flexibilität die Möglichkeit bieten könnte, letztere Herausforderungen zu bewältigen.
The specific temporal evolution of bacterial and phage population sizes, in particular bacterial depletion and the emergence of a resistant bacterial population, can be seen as a kinetic fingerprint that depends on the manifold interactions of the specific phage–host pair during the course of infection. We have elaborated such a kinetic fingerprint for a human urinary tract Klebsiella pneumoniae isolate and its phage vB_KpnP_Lessing by a modeling approach based on data from in vitro co-culture. We found a faster depletion of the initially sensitive bacterial population than expected from simple mass action kinetics. A possible explanation for the rapid decline of the bacterial population is a synergistic interaction of phages which can be a favorable feature for phage therapies. In addition to this interaction characteristic, analysis of the kinetic fingerprint of this bacteria and phage combination revealed several relevant aspects of their population dynamics: A reduction of the bacterial concentration can be achieved only at high multiplicity of infection whereas bacterial extinction is hardly accomplished. Furthermore the binding affinity of the phage to bacteria is identified as one of the most crucial parameters for the reduction of the bacterial population size. Thus, kinetic fingerprinting can be used to infer phage–host interactions and to explore emergent dynamics which facilitates a rational design of phage therapies.
Software updates are a critical success factor in mobile app ecosystems. Through publishing regular updates, platform providers enhance their operating systems for the benefit of both end users and third-party developers. It is also a way of attracting new customers. However, this platform evolution poses the risk of inadvertently introducing software problems, which can severely disturb the ecosystem’s balance by compromising its foundational technologies. So far, little to no research has addressed this issue from a user-centered perspective. The thesis at hand draws on IS post-adoption literature to investigate the potential negative influences of operating system updates on mobile app users. The release of Apple’s iOS 13 update serves as research object. Based on over half a million user reviews from the AppStore, data mining techniques are applied to study the impact of the new platform version. The results show that iOS 13 caused complications with a large number of popular apps, leading to a significant decline in user ratings and an uptrend in negative sentiment. Feature requests, functional complaints, and device compatibility are identified as the three major issue categories. These issue types are compared in terms of their quantifiable negative effect on users’ continuance intention. In essence, the findings contribute to IS research on post-adoption behavior and provide guidance to ecosystem participants in dealing with update-induced platform issues.
BIOfid is a specialized information service currently being developed to mobilize biodiversity data dormant in printed historical and modern literature and to offer a platform for open access journals on the science of biodiversity. Our team of librarians, computer scientists and biologists produce high-quality text digitizations, develop new text-mining tools and generate detailed ontologies enabling semantic text analysis and semantic search by means of user-specific queries. In a pilot project we focus on German publications on the distribution and ecology of vascular plants, birds, moths and butterflies extending back to the Linnaeus period about 250 years ago. The three organism groups have been selected according to current demands of the relevant research community in Germany. The text corpus defined for this purpose comprises over 400 volumes with more than 100,000 pages to be digitized and will be complemented by journals from other digitization projects, copyright-free and project-related literature. With TextImager (Natural Language Processing & Text Visualization) and TextAnnotator (Discourse Semantic Annotation) we have already extended and launched tools that focus on the text-analytical section of our project. Furthermore, taxonomic and anatomical ontologies elaborated by us for the taxa prioritized by the project’s target group - German institutions and scientists active in biodiversity research - are constantly improved and expanded to maximize scientific data output. Our poster describes the general workflow of our project ranging from literature acquisition via software development, to data availability on the BIOfid web portal (http://biofid.de/), and the implementation into existing platforms which serve to promote global accessibility of biodiversity data.
Risk evaluations for agricultural chemicals are necessary to preserve healthy populations of honey bee colonies. Field studies on whole colonies are limited in behavioural research, while results from lab studies allow only restricted conclusions on whole colony impacts. Methods for automated long-term investigations of behaviours within comb cells, such as brood care, were hitherto missing. In the present study, we demonstrate an innovative video method that enables within-cell analysis in honey bee (Apis mellifera) observation hives to detect chronic sublethal neonicotinoid effects of clothianidin (1 and 10 ppb) and thiacloprid (200 ppb) on worker behaviour and development. In May and June, colonies which were fed 10 ppb clothianidin and 200 ppb thiacloprid in syrup over three weeks showed reduced feeding visits and duration throughout various larval development days (LDDs). On LDD 6 (capping day) total feeding duration did not differ between treatments. Behavioural adaptation was exhibited by nurses in the treatment groups in response to retarded larval development by increasing the overall feeding timespan. Using our machine learning algorithm, we demonstrate a novel method for detecting behaviours in an intact hive that can be applied in a versatile manner to conduct impact analyses of chemicals, pests and other stressors.
Measurement of ϒ(1S) elliptic flow at forward rapidity in Pb-Pb collisions at √sNN = 5.02 TeV
(2019)
The first measurement of the ϒ(1S) elliptic flow coefficient (v2) is performed at forward rapidity (2.5 < y < 4) in Pb–Pb collisions at √sNN = 5.02 TeV with the ALICE detector at the LHC. The results are obtained with the scalar product method and are reported as a function of transverse momentum (pT) up to 15 GeV/c in the 5%–60% centrality interval. The measured Υ(1S)v2 is consistent with 0 and with the small positive values predicted by transport models within uncertainties. The v2 coefficient in 2 < pT < 15 GeV/c is lower than that of inclusive J/ψ mesons in the same pT interval by 2.6 standard deviations. These results, combined with earlier suppression measurements, are in agreement with a scenario in which the Υ(1S) production in Pb–Pb collisions at LHC energies is dominated by dissociation limited to the early stage of the collision, whereas in the J/ψ case there is substantial experimental evidence of an additional regeneration component.
Dancing is an activity that positively enhances the mood of people that consists of feeling the music and expressing it in rhythmic movements with the body. Learning how to dance can be challenging because it requires proper coordination and understanding of rhythm and beat. In this paper, we present the first implementation of the Dancing Coach (DC), a generic system designed to support the practice of dancing steps, which in its current state supports the practice of basic salsa dancing steps. However, the DC has been designed to allow the addition of more dance styles. We also present the first user evaluation of the DC, which consists of user tests with 25 participants. Results from the user test show that participants stated they had learned the basic salsa dancing steps, to move to the beat and body coordination in a fun way. Results also point out some direction on how to improve the future versions of the DC.
Iconographic representations on ancient artifacts are described in many existing databases and literature as human readable text. We applied Natural Language Processing (NLP) approaches in order to extract the semantics out of these textual descriptions and in this way enable semantic searches over them. This allows more sophisticated requests compared to the common existing keyword searches. As we show in our experiments based on numismatic datasets, the approach is generic in the sense that once the system is trained on one dataset, it can be applied without any further manual work also to datasets that have similar content. Of course, additional adaptions would further improve the results. Since the approach requires manual work only during the training phase, it can easily be applied to huge datasets without manual work and therefore without major extra costs. In fact, in our experience bigger datasets generate even better results because there is more data for training. Since our approach is not bound to a certain domain and the numismatic datasets are just an example, it could serve as a blueprint for many other areas. It could also help to build bridges between disciplines since textual iconographic descriptions are to be found also for pottery, sculpture and elsewhere.
This paper is a contribution to exploring and analyzing space-improvements in concurrent programming languages, in particular in the functional process-calculus CHF. Space-improvements are defined as a generalization of the corresponding notion in deterministic pure functional languages. The main part of the paper is the O(n ·logn) algorithm SPOPTN for offline space optimization of several parallel independent processes. Applications of this algorithm are: (i) affirmation of space improving transformations for particular classes of program transformations; (ii) support of an interpreter-based method for refuting space-improvements; and (iii) as a stand-alone offline-optimizer for space (or similar resources) of parallel processes.
Correction to: Scientifc Reports https://doi.org/10.1038/s41598-019-43857-5, published online 17 May 2019. In the original version of this Article, Jan-Hendrik Trösemeier was incorrectly affiliated with ‘Division of Allergology, Paul Ehrlich Institut, Langen, Germany’. Te correct afliations are listed below...
Gegenstand der hier vorgestellten Arbeit ist eine Applikation für die virtuelle Realität (VR), die in der Lage ist, die Struktur eines beliebigen Textes als begehbare, interaktive Stadt zu visualisieren. Darüber hinaus bietet das Programm eine besondere Textsuche an, die so in anderen konventionellen Textverarbeitungsprogrammen nicht vorzufinden ist. Dank der strukturellen Analyse und der Verwendung einiger außergewöhnlicher Analysetools des TextImager [2], ermöglicht text2City nicht nur die Suche nach bestimmten Textmustern, sondern zum Beispiel auch die Bestimmung der Textebene (Wort, Satz, Absatz, etc.) und einiges mehr. Ein weiteres Feature ist die Kommunikationsverbindung zwischen dem TextAnnotator-Service [1] und text2City, die dem Benutzer die Möglichkeit zum Annotieren bietet, aber auch von anderen Personen durchgeführte Annotationen sofort sichtbar machen kann. Für die Ausführung des Programms ist eine der beiden VRBrillen, Oculus Rift oder HTC Vive, ein für VR geeigneter PC, sowie die Software Unity nötig.
Though the range of invariance in recognition of novel objects is a basic aspect of human vision, its characterization has remained surprisingly elusive. Here we report tolerance to scale and position changes in one-shot learning by measuring recognition accuracy of Korean letters presented in a flash to non-Korean subjects who had no previous experience with Korean letters. We found that humans have significant scale-invariance after only a single exposure to a novel object. The range of translation-invariance is limited, depending on the size and position of presented objects. To understand the underlying brain computation associated with the invariance properties, we compared experimental data with computational modeling results. Our results suggest that to explain invariant recognition of objects by humans, neural network models should explicitly incorporate built-in scale-invariance, by encoding different scale channels as well as eccentricity-dependent representations captured by neurons’ receptive field sizes and sampling density that change with eccentricity. Our psychophysical experiments and related simulations strongly suggest that the human visual system uses a computational strategy that differs in some key aspects from current deep learning architectures, being more data efficient and relying more critically on eye-movements.
In pathology, tissue images are evaluated using a light microscope, relying on the expertise and experience of pathologists. There is a great need for computational methods to quantify and standardize histological observations. Computational quantification methods become more and more essential to evaluate tissue images. In particular, the distribution of tumor cells and their microenvironment are of special interest. Here, we systematically investigated tumor cell properties and their spatial neighborhood relations by a new application of statistical analysis to whole slide images of Hodgkin lymphoma, a tumor arising in lymph nodes, and inflammation of lymph nodes called lymphadenitis. We considered properties of more than 400, 000 immunohistochemically stained, CD30-positive cells in 35 whole slide images of tissue sections from subtypes of the classical Hodgkin lymphoma, nodular sclerosis and mixed cellularity, as well as from lymphadenitis. We found that cells of specific morphology exhibited significant favored and unfavored spatial neighborhood relations of cells in dependence of their morphology. This information is important to evaluate differences between Hodgkin lymph nodes infiltrated by tumor cells (Hodgkin lymphoma) and inflamed lymph nodes, concerning the neighborhood relations of cells and the sizes of cells. The quantification of neighborhood relations revealed new insights of relations of CD30-positive cells in different diagnosis cases. The approach is general and can easily be applied to whole slide image analysis of other tumor types.
The morphology of presynaptic specializations can vary greatly ranging from classical single-release-site boutons in the central nervous system to boutons of various sizes harboring multiple vesicle release sites. Multi-release-site boutons can be found in several neural contexts, for example at the neuromuscular junction (NMJ) of body wall muscles of Drosophila larvae. These NMJs are built by two motor neurons forming two types of glutamatergic multi-release-site boutons with two typical diameters. However, it is unknown why these distinct nerve terminal configurations are used on the same postsynaptic muscle fiber. To systematically dissect the biophysical properties of these boutons we developed a full three-dimensional model of such boutons, their release sites and transmitter-harboring vesicles and analyzed the local vesicle dynamics of various configurations during stimulation. Here we show that the rate of transmission of a bouton is primarily limited by diffusion-based vesicle movements and that the probability of vesicle release and the size of a bouton affect bouton-performance in distinct temporal domains allowing for an optimal transmission of the neural signals at different time scales. A comparison of our in silico simulations with in vivo recordings of the natural motor pattern of both neurons revealed that the bouton properties resemble a well-tuned cooperation of the parameters release probability and bouton size, enabling a reliable transmission of the prevailing firing-pattern at diffusion-limited boutons. Our findings indicate that the prevailing firing-pattern of a neuron may determine the physiological and morphological parameters required for its synaptic terminals.
Relying on the theory of Saward (2010) and Disch (2015), we study political representation through the lens of representative claim-making. We identify a gap between the theoretical concept of claim-making and the empirical (quantitative) assessment of representative claims made in the real world’s representative contexts. Therefore, we develop a new approach to map and quantify representative claims in order to subsequently measure the reception and validation of the claims by the audience. To test our method, we analyse all the debates of the German parliament concerned with the introduction of the gender quota in German supervisory boards from 2013 to 2017 in a two-step process. At first, we assess which constituencies the MPs claim to represent and how they justify their stance. Drawing on multiple correspondence analysis, we identify different claim patterns. Second, making use of natural language processing techniques and logistic regression on social media data, we measure if and how the asserted claims in the parliamentary debates are received and validated by the respective audience. We come to the conclusion that the constituency as ultimate judge of legitimacy has not been comprehensively conceptualized yet.
The formulation of the Partial Information Decomposition (PID) framework by Williams and Beer in 2010 attracted a significant amount of attention to the problem of defining redundant (or shared), unique and synergistic (or complementary) components of mutual information that a set of source variables provides about a target. This attention resulted in a number of measures proposed to capture these concepts, theoretical investigations into such measures, and applications to empirical data (in particular to datasets from neuroscience). In this Special Issue on “Information Decomposition of Target Effects from Multi-Source Interactions” at Entropy, we have gathered current work on such information decomposition approaches from many of the leading research groups in the field. We begin our editorial by providing the reader with a review of previous information decomposition research, including an overview of the variety of measures proposed, how they have been interpreted and applied to empirical investigations. We then introduce the articles included in the special issue one by one, providing a similar categorisation of these articles into: i. proposals of new measures; ii. theoretical investigations into properties and interpretations of such approaches, and iii. applications of these measures in empirical studies. We finish by providing an outlook on the future of the field.
This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as a model for low resource languages in general. Cuneiform texts are invaluable sources for the study of history, languages, economy, and cultures of Ancient Mesopotamia and its surrounding regions. Assyriology, the discipline dedicated to their study, has vast research potential, but lacks the modern means for computational processing and analysis. Our project, Machine Translation and Automated Analysis of Cuneiform Languages, aims to fill this gap by bringing together corpus data, lexical data, linguistic annotations and object metadata. The project’s main goal is to build a pipeline for machine translation and annotation of Sumerian Ur III administrative texts. The rich and structured data is then to be made accessible in the form of (Linguistic) Linked Open Data (LLOD), which should open them to a larger research community. Our contribution is two-fold: in terms of language technology, our work represents the first attempt to develop an integrative infrastructure for the annotation of morphology and syntax on the basis of RDF technologies and LLOD resources. With respect to Assyriology, we work towards producing the first syntactically annotated corpus of Sumerian.
Anaplastic large cell lymphoma (ALCL) and classical Hodgkin lymphoma (cHL) are lymphomas that contain CD30-expressing tumor cells and have numerous pathological similarities. Whereas ALCL is usually diagnosed at an advanced stage, cHL more frequently presents with localized disease. The aim of the present study was to elucidate the mechanisms underlying the different clinical presentation of ALCL and cHL. Chemokine and chemokine receptor expression were similar in primary ALCL and cHL cases apart from the known overexpression of the chemokines CCL17 and CCL22 in the Hodgkin and Reed-Sternberg (HRS) cells of cHL. Consistent with the overexpression of these chemokines, primary cHL cases encountered a significantly denser T cell microenvironment than ALCL. Additionally to differences in the interaction with their microenvironment, cHL cell lines presented a lower and less efficient intrinsic cell motility than ALCL cell lines, as assessed by time-lapse microscopy in a collagen gel and transwell migration assays. We thus propose that the combination of impaired basal cell motility and differences in the interaction with the microenvironment hamper the dissemination of HRS cells in cHL when compared with the tumor cells of ALCL.
Our recently developed LRSX Tool implements a technique to automatically prove the correctness of program transformations in higher-order program calculi which may permit recursive let-bindings as they occur in functional programming languages. A program transformation is correct if it preserves the observational semantics of programs- In our tool the so-called diagram method is automated by combining unification, matching, and reasoning on alpha-renamings on the higher-order metalanguage, and automating induction proofs via an encoding into termination problems of term rewrite systems. We explain the techniques, we illustrate the usage of the tool, and we report on experiments.
The focus of this paper are space-improvements of programs, which are transformations that do not worsen the space requirement during evaluations. A realistic theoretical treatment must take garbage collection method into account. We investigate space improvements under the assumption of an optimal garbage collector. Such a garbage collector is not implementable, but there is an advantage: The investigations are independent of potential changes in an implementable garbage collector and our results show that the evaluation and other similar transformations are space-improvements.
Human readers have the ability to infer knowledge from text, even if that particular information is not explicitly stated. In this thesis, we address the phenomena of text-level implicit information and outline novel automated methods for its recovery.
The main focus of this work is on two types of unexpressed content that arises between sentences (implicit discourse relations) and within sentences (implicit semantic roles).
Traditional approaches mostly rely on costly rich linguistic features, e.g., sentiment or frame-based lexicons, and require heuristics or manual feature engineering.
As an improvement, we propose a collection of generic resource-lean methods, implemented in the form of statistical background knowledge or by means of neural architectures.
Our models are largely language-independent and produce state-of-the-art performance, e.g., in the classification of Chinese implicit discourse relations, or the detection of locally covert predicative arguments in free texts.
In novel experiments, we quantitatively demonstrate that both types of implicit information are mutually dependent insofar as, for instance, some implicit roles directly correlate with implicit discourse relations of similar properties.
We show that implicit information processing further benefits downstream applications and demonstrate its applicability to the higher-level task of narrative story understanding.
In the conclusion of the dissertation, we argue for the need of implicit information processing in order to realize the goal of true natural language understanding.
The development of multimodal sensor-based applications designed to support learners with the improvement of their skills is expensive since most of these applications are tailor-made and built from scratch. In this paper, we show how the Presentation Trainer (PT), a multimodal sensor-based application designed to support the development of public speaking skills, can be modularly extended with a Virtual Reality real-time feedback module (VR module), which makes usage of the PT more immersive and comprehensive. The described study consists of a formative evaluation and has two main objectives. Firstly, a technical objective is concerned with the feasibility of extending the PT with an immersive VR Module. Secondly, a user experience objective focuses on the level of satisfaction of interacting with the VR extended PT. To study these objectives, we conducted user tests with 20 participants. Results from our test show the feasibility of modularly extending existing multimodal sensor-based applications, and in terms of learning and user experience, results indicate a positive attitude of the participants towards using the application (PT+VR module).
Programmable hardware in the form of FPGAs found its place in various high energy physics experiments over the past few decades. These devices provide highly parallel and fully configurable data transport, data formatting, and data processing capabilities with custom interfaces, even in rigid or constrained environments. Additionally, FPGA functionalities and the number of their logic resources have grown exponentially in the last few years, making FPGAs more and more suitable for complex data processing tasks. ALICE is one of the four main experiments at the LHC and specialized in the study of heavy-ion collisions. The readout chain of the ALICE detectors makes use of FPGAs at various places. The Read-Out Receiver Cards (RORCs) are one example of FPGA-based readout hardware, building the interface between the custom detector electronics and the commercial server nodes in the data processing clusters of the Data Acquisition (DAQ) system as well as the High Level Trigger (HLT). These boards are implemented as server plug-in cards with serial optical links towards the detectors. Experimental data is received via more than 500 optical links, already partly pre-processed in the FPGAs, and pushed towards the host machines. Computer clusters consisting of a few hundred nodes collect, aggregate, compress, reconstruct, and prepare the experimental data for permanent storage and later analysis. With the end of the first LHC run period in 2012 and the start of Run 2 in 2015, the DAQ and HLT systems were renewed and several detector components were upgraded for higher data rates and event rates. Increased detector link rates and obsolete host interfaces rendered it impossible to reuse the previous RORCs in Run 2.
This thesis describes the development, integration, and maintenance of the next generation of RORCs for ALICE in Run 2. A custom hardware platform, initially developed as a joint effort between the ALICE DAQ and HLT groups in the course of this work, found its place in the Run 2 readout systems of the ALICE and ATLAS experiments. The hardware fulfills all experiment requirements, matches its target performance, and has been running stable in the production systems since the start of Run 2. Firmware and software developments for the hardware evaluation, the design of the board, the mass production hardware tests, as well as the operation of the final board in the HLT, were carried out as part of this work. 74 boards were integrated into the HLT hardware and software infrastructure, with various firmware and software developments, to provide the main experimental data input and output interface of the HLT for Run 2. The hardware cluster finder, an FPGA-based data pre-processing core from the previous generation of RORCs, was ported to the new hardware. It has been improved and extended to meet the experimental requirements throughout Run 2. The throughput of this firmware component could be doubled and the algorithm extended, providing an improved noise rejection and an increased overall mean data compression ratio compared to its previous implementation. The hardware cluster finder forms a crucial component in the HLT data reconstruction and compression scheme with a processing performance of one board equivalent to around ten server nodes for comparable processing steps in software.
The work on the firmware development, especially on the hardware cluster finder, once more demonstrated that developing and maintaining data processing algorithms with the common low-level hardware description methods is tedious and time-consuming. Therefore, a high-level synthesis (HLS) hardware description method applying dataflow computing at an algorithmic level to FPGAs was evaluated in this context. The hardware cluster finder served as an example of a typical data processing algorithm in a high energy physics readout application. The existing and highly optimized low-level implementation provided a reference for comparisons in terms of throughput and resource usage. The cluster finder algorithm could be implemented in the dataflow description with comparably little effort, providing fast development cycles, compact code and at, the same time, simplified extension and maintenance options. The performance results in terms of throughput and resource usage are comparable to the manual implementation. The dataflow environment proved to be highly valuable for design space explorations. An integration of the dataflow description into the HLT firmware and software infrastructure could be demonstrated as a proof of concept. A high-level hardware description could ease both the design space exploration, the initial development, the maintenance, and the extension of hardware algorithms for high energy physics readout applications.
Background: Microarray analysis represents a powerful way to test scientific hypotheses on the functionality of cells. The measurements consider the whole genome, and the large number of generated data requires sophisticated analysis. To date, no gold-standard for the analysis of microarray images has been established. Due to the lack of a standard approach there is a strong need to identify new processing algorithms.
Methods: We propose a novel approach based on hyperbolic partial differential equations (PDEs) for unsupervised spot segmentation. Prior to segmentation, morphological operations were applied for the identification of co-localized groups of spots. A grid alignment was performed to determine the borderlines between rows and columns of spots. PDEs were applied to detect the inflection points within each column and row; vertical and horizontal luminance profiles were evolved respectively. The inflection points of the profiles determined borderlines that confined a spot within adapted rectangular areas. A subsequent k-means clustering determined the pixels of each individual spot and its local background.
Results: We evaluated the approach for a data set of microarray images taken from the Stanford Microarray Database (SMD). The data set is based on two studies on global gene expression profiles of Arabidopsis Thaliana. We computed values for spot intensity, regression ratio, and coefficient of determination. For spots with irregular contours and inner holes, we found intensity values that were significantly different from those determined by the GenePix Pro microarray analysis software. We determined the set of differentially expressed genes from our intensities and identified more activated genes than were predicted by the GenePix software.
Conclusions: Our method represents a worthwhile alternative and complement to standard approaches used in industry and academy. We highlight the importance of our spot segmentation approach, which identified supplementary important genes, to better explains the molecular mechanisms that are activated in a defense responses to virus and pathogen infection.
The production of K∗(892)0 and ϕ(1020) mesons has been measured in p–Pb collisions at √sNN = 5.02 TeV. K∗0 and ϕ are reconstructed via their decay into charged hadrons with the ALICE detector in the rapidity range - 0.5 < y < 0. The transverse momentum spectra, measured as a function of the multiplicity, have a pT range from 0 to 15 GeV/c for K∗0 and from 0.3 to 21 GeV/c for ϕ. Integrated yields, mean transverse momenta and particle ratios are reported and compared with results in pp collisions at √s = 7 TeV and Pb–Pb collisions at √sNN = 2.76 TeV. In Pb–Pb and p–Pb collisions, K∗0 and ϕ probe the hadronic phase of the system and contribute to the study of particle formation mechanisms by comparison with other identified hadrons. For this purpose, the mean transverse momenta and the differential proton-to-ϕ ratio are discussed as a function of the multiplicity of the event. The short-lived K∗0 is measured to investigate re-scattering effects, believed to be related to the size of the system and to the lifetime of the hadronic phase.
Heterologously expressed genes require adaptation to the host organism to ensure adequate levels of protein synthesis, which is typically approached by replacing codons by the target organism’s preferred codons. In view of frequently encountered suboptimal outcomes we introduce the codon-specific elongation model (COSEM) as an alternative concept. COSEM simulates ribosome dynamics during mRNA translation and informs about protein synthesis rates per mRNA in an organism- and context-dependent way. Protein synthesis rates from COSEM are integrated with further relevant covariates such as translation accuracy into a protein expression score that we use for codon optimization. The scoring algorithm further enables fine-tuning of protein expression including deoptimization and is implemented in the software OCTOPOS. The protein expression score produces competitive predictions on proteomic data from prokaryotic, eukaryotic, and human expression systems. In addition, we optimized and tested heterologous expression of manA and ova genes in Salmonella enterica serovar Typhimurium. Superiority over standard methodology was demonstrated by a threefold increase in protein yield compared to wildtype and commercially optimized sequences.
Eingebettete Systeme sind Rechnersysteme, die in einem technischen Umfeld eingebettet sind und dort ihre Arbeit verrichten. Kennzeichen heutiger und zukünftiger eingebetteter Systeme sind, dass sie in einer immer größeren Anzahl in der Industrie, im Haushalt und in Büros, in Eisenbahnen und Flugzeugen und in vielen weiteren Umgebungen auftreten. Sie sind oftmals stark vernetzt und müssen hochverlässlich sein, um Unfälle zu vermeiden und so Anwender und Nutzer vor Schaden zu bewahren. Die Beherrschung dieser eingebetteten Systeme ist meist hochkomplex, da durch die Vernetzung eine Vielzahl von Komponenten zusammenarbeiten. Für den Anwender ist es daher schwer, den Überblick zu behalten. Im Hinblick auf die Verlässlichkeit ist es wichtig, Reaktionen auf Fehler und unvorhergesehene Situationen in diesen Systemen innerhalb definierter Zeitschranken zu liefern, um Schaden zu vermeiden.
Selbstorganisation wird heutzutage als probates Mittel angesehen, um die Herausforderungen, die sich mit der Inbetriebnahme, Nutzung und Instandhaltung von komplexen eingebetteten Systemen ergeben, zu meistern. Der Beitrag dieser Arbeit ist eine Untersuchung selbstorganisierender eingebetteter Systeme:
Im ersten Teil wird ein Überblick über den aktuellen Stand der Forschung bei eingebetteten Systemen sowie über den Bereich der Selbstorganisation für eingebettete Systeme gegeben. Dabei wird die Idee des Organic Computings beschrieben, welches sich mit Selbstorganisationsprinzipien in IT-Systemen beschäftigt, und es werden aktuelle Forschungstrends dazu beschrieben.
Im zweiten Teil der Arbeit werden eigene Arbeiten im Feld von selbstorganisierenden eingebetteten Systemen vorgestellt. Sie behandeln verschiedene Aspekte eines künstlichen Hormonsystems (KHS), welches zur selbstorganisierten Verteilung von Tasks auf einer Menge von vernetzten Prozessoren genutzt werden kann. Dabei werden einerseits grundlegende Definitionen des Organic Computings im Bezug auf das KHS untersucht und bewertet. Andererseits werden neue Lerntechniken für das KHS untersucht, die sich am maschinellen Lernen orientieren. Außerdem wird ein mehrstufiges KHS entwickelt und evaluiert, um die Vergabe einer sehr großen Anzahl von Tasks (≥ 1000) auf einer sehr großen Anzahl von Prozessoren (≥ 10000) zu ermöglichen.
CRFVoter : gene and protein related object recognition using a conglomerate of CRF-based tools
(2019)
Background: Gene and protein related objects are an important class of entities in biomedical research, whose identification and extraction from scientific articles is attracting increasing interest. In this work, we describe an approach to the BioCreative V.5 challenge regarding the recognition and classification of gene and protein related objects. For this purpose, we transform the task as posed by BioCreative V.5 into a sequence labeling problem. We present a series of sequence labeling systems that we used and adapted in our experiments for solving this task. Our experiments show how to optimize the hyperparameters of the classifiers involved. To this end, we utilize various algorithms for hyperparameter optimization. Finally, we present CRFVoter, a two-stage application of Conditional Random Field (CRF) that integrates the optimized sequence labelers from our study into one ensemble classifier.
Results: We analyze the impact of hyperparameter optimization regarding named entity recognition in biomedical research and show that this optimization results in a performance increase of up to 60%. In our evaluation, our ensemble classifier based on multiple sequence labelers, called CRFVoter, outperforms each individual extractor’s performance. For the blinded test set provided by the BioCreative organizers, CRFVoter achieves an F-score of 75%, a recall of 71% and a precision of 80%. For the GPRO type 1 evaluation, CRFVoter achieves an F-Score of 73%, a recall of 70% and achieved the best precision (77%) among all task participants.
Conclusion: CRFVoter is effective when multiple sequence labeling systems are to be used and performs better then the individual systems collected by it.
We explore space improvements in LRP, a polymorphically typed call-by-need functional core language. A relaxed space measure is chosen for the maximal size usage during an evaluation. It Abstracts from the details of the implementation via abstract machines, but it takes garbage collection into account and thus can be seen as a realistic approximation of space usage. The results are: a context lemma for space improving translations and for space equivalences; all but one reduction rule of the calculus are shown to be space improvements, and the exceptional one, the copy-rule, is shown to increase space only moderately.
Several further program transformations are shown to be space improvements or space equivalences, in particular the translation into machine expressions is a space equivalence. These results are a step Forward in making predictions about the change in runtime space behavior of optimizing transformations in callbyneed functional languages.
We explore space improvements in LRP, a polymorphically typed call-by-need functional core language. A relaxed space measure is chosen for the maximal size usage during an evaluation. It Abstracts from the details of the implementation via abstract machines, but it takes garbage collection into account and thus can be seen as a realistic approximation of space usage. The results are: a context lemma for space improving translations and for space equivalences; all but one reduction rule of the calculus are shown to be space improvements, and the exceptional one, the copy-rule, is shown to increase space only moderately.
Several further program transformations are shown to be space improvements or space equivalences, in particular the translation into machine expressions is a space equivalence. These results are a step Forward in making predictions about the change in runtime space behavior of optimizing transformations in callbyneed functional languages.
The synchronous pi-calculus is translated into a core language of Concurrent Haskell extended by futures (CHF). The translation simulates the synchronous message-passing of the pi-calculus by sending messages and adding synchronization using Concurrent Haskell's mutable shared-memory locations (MVars). The semantic criterion is a contextual semantics of the pi-calculus and of CHF using may- and should-convergence as observations. The results are equivalence with respect to the observations, full abstraction of the translation of closed processes, and adequacy of the translation on open processes. The translation transports the semantics of the pi-calculus processes under rather strong criteria, since error-free programs are translated into error-free ones, and programs without non-deterministic error possibilities are also translated into programs without non-deterministic error-possibilities. This investigation shows that CHF embraces the expressive power and the concurrency capabilities of the pi-calculus.
The hepatitis C virus (HCV) RNA replication cycle is a dynamic intracellular process occurring in three-dimensional space (3D), which is difficult both to capture experimentally and to visualize conceptually. HCV-generated replication factories are housed within virus-induced intracellular structures termed membranous webs (MW), which are derived from the Endoplasmatic Reticulum (ER). Recently, we published 3D spatiotemporal resolved diffusion–reaction models of the HCV RNA replication cycle by means of surface partial differential equation (sPDE) descriptions. We distinguished between the basic components of the HCV RNA replication cycle, namely HCV RNA, non-structural viral proteins (NSPs), and a host factor. In particular, we evaluated the sPDE models upon realistic reconstructed intracellular compartments (ER/MW). In this paper, we propose a significant extension of the model based upon two additional parameters: different aggregate states of HCV RNA and NSPs, and population dynamics inspired diffusion and reaction coefficients instead of multilinear ones. The combination of both aspects enables realistic modeling of viral replication at all scales. Specifically, we describe a replication complex state consisting of HCV RNA together with a defined amount of NSPs. As a result of the combination of spatial resolution and different aggregate states, the new model mimics a cis requirement for HCV RNA replication. We used heuristic parameters for our simulations, which were run only on a subsection of the ER. Nevertheless, this was sufficient to allow the fitting of core aspects of virus reproduction, at least qualitatively. Our findings should help stimulate new model approaches and experimental directions for virology.
"Prognosen sind schwierig, besonders, wenn sie die Zukunft betreffen", sagt ein geflügeltes Wort. Die letzte Finanzkrise ist dafür ein gutes Beispiel, denn die wenigsten Analysten und Wirtschaftsweisen haben sie kommen sehen. Da Finanzkrisen glücklicherweise selten sind, ist es allerdings schwierig, Modelle zu entwickeln, die rechtzeitig vor einem Crash warnen.
Die wahrscheinlich beste Entscheidung : wie Online-Algorithmen mit der unsicheren Zukunft rechnen
(2018)
Lohnt es sich, als Skianfänger in einem schneeunsicheren Jahr Skier zu kaufen? Oder ist es günstiger, sie zu mieten? Oft müssen wir Entscheidungen treffen, ohne genügend Informationen über die Zukunft zu haben. Das gilt in noch größerem Maße für Rechnersysteme, die große Datenmengen verarbeiten und schnelle Entscheidungen treffen müssen. Damit sie trotz einer Vielzahl von Unsicherheiten erfolgreich arbeiten können, entwickeln Informatiker OnlineAlgorithmen.
Künstliche Intelligenz (KI), also intelligente Software, führt heutzutage Aufgaben aus, die man einst nur Menschen zutraute. Schon heute ist sie in vielen Bereichen unserer Gesellschaft angekommen – man denke an selbstfahrende Fahrzeuge, medizinische Diagnostik, Übersetzungsprogramme, persönliche Gesprächsassistenten, Suchfunktionen und Robotik. Doch wie weit können wir KI-Systemen vertrauen?
LSTMVoter : chemical named entity recognition using a conglomerate of sequence labeling tools
(2019)
Background: Chemical and biomedical named entity recognition (NER) is an essential preprocessing task in natural language processing. The identification and extraction of named entities from scientific articles is also attracting increasing interest in many scientific disciplines. Locating chemical named entities in the literature is an essential step in chemical text mining pipelines for identifying chemical mentions, their properties, and relations as discussed in the literature. In this work, we describe an approach to the BioCreative V.5 challenge regarding the recognition and classification of chemical named entities. For this purpose, we transform the task of NER into a sequence labeling problem. We present a series of sequence labeling systems that we used, adapted and optimized in our experiments for solving this task. To this end, we experiment with hyperparameter optimization. Finally, we present LSTMVoter, a two-stage application of recurrent neural networks that integrates the optimized sequence labelers from our study into a single ensemble classifier.
Results: We introduce LSTMVoter, a bidirectional long short-term memory (LSTM) tagger that utilizes a conditional random field layer in conjunction with attention-based feature modeling. Our approach explores information about features that is modeled by means of an attention mechanism. LSTMVoter outperforms each extractor integrated by it in a series of experiments. On the BioCreative IV chemical compound and drug name recognition (CHEMDNER) corpus, LSTMVoter achieves an F1-score of 90.04%; on the BioCreative V.5 chemical entity mention in patents corpus, it achieves an F1-score of 89.01%.
Availability and implementation: Data and code are available at https://github.com/texttechnologylab/LSTMVoter.
Mit der Smart Learning Infrastruktur wurde ein neuartiges didaktisches Konzept für Kurse in der Weiterbildung entwickelt. Diese Infrastruktur ist vielfältig anwendbar. Erste Analysen von Kursen zeigen, dass TeilnehmerInnen, die alle Übungen korrekt abgearbeitet haben, eine bessere Note erreichen als die Durchschnittsnote. Dieser Beitrag beschreibt ein Konzept für ein Gamification-Modul, welches mit spielerischen Elementen möglichst frühzeitig dazu animiert, alle Übungen eines Kurses korrekt und mit Verstand abzuarbeiten.
We propose and create a new data model for learning specific environments and learning analytics applications. This is motivated from the experience in the Fiber Bundle Data Model used for large - time and space dependent - data. Our proposed data model integrates file or stream-based data structures from capturing devices more easily. Learning analytics algorithms are added directly to the data, and formulation of queries and analytics is done in Python. It is designed to improve collaboration in the field of learning analytics. We leverage a hierarchical data structure, where varying data is located near the leaves. Abstract data types are identified in four distinct pathways, which allow storing most diverse data sources. We compare different implementations regarding its memory footprint and performance. Our tests indicate that LeAn Bundles can be smaller than a naïve xAPI export. The benchmarks show that the performance is comparable to a MongoDB, while having the benefit of being portable and extensible.
Digitale Kompetenzen von Hochschullehrenden messen : Validierungsstudie eines Kompetenzrasters
(2018)
Der Beitrag beschreibt die Entwicklung eines Kompetenzrasters zur Erfassung digitaler Kompetenzen von Hochschullehrenden und stellt Ergebnisse der Validierung des Rasters vor. Dazu werden die Ergebnisse eines Pre-Tests (N=90) unter Teilnehmenden eines E-LearningQualifizierungsangebots inferenzstatistisch ausgewertet. Zusätzlich werden zur äußeren Validierung des Kompetenzrasters Ergebnisse mit Aussagen der Befragungsteilnehmer*innen verglichen, die mit Hilfe qualitativer Methoden aus E-Portfolios gewonnen wurden. Die skalenanalytischen Befunde erbrachten für sechs der acht Subdimensionen digitaler Kompetenz eindeutige, einfaktorielle Lösungen mit guten Varianzaufklärungen. Die Subskalen verfügen über hohe interne Konsistenzen. Zwei Dimensionen trennen sich faktorenanalytisch in weitere Subtests auf, die sich im Test ebenfalls als reliabel erweisen. Zur Validität des Kompetenzrasters konnten durch Zusammenhänge mit Aussagen aus E-Portfolios positive Belege gesammelt werden.
Wie eine "Heilslehre" überzieht der Begriff "Digitalisierung" fast alle Lebensbereiche – natürlich auch den Bildungsbereich. Gerade wir Informatiker*innen sind gefordert, diese Wege der Bildungstransformation mitzugestalten. Wir zusammen mit den Erziehungswissenschaftler*innen und Psychologen*innen müssen identifizieren, aufzeigen und vorbildlich umsetzen, was sinnvoll und möglich ist. Wir sind diejenigen, die die Bedingungen des Gelingens und auch die der Irrwege erforschen und aufzeigen müssen. Digitalisierungswahnsinn brauchen wir nicht!
Die 16. Jahrestagung DeLFI 2018 der Fachgruppe eLearning der Gesellschaft für Informatik e. V. findet vom 10. bis 13.September 2018 an der Johann Wolfgang Goethe – Universität, Frankfurt am Main statt, gemeinsam mit der 8. Tagung für Hochschuldidaktik der Informatik HDI 2018. ...
The endoplasmic reticulum–mitochondria encounter structure (ERMES) connects the mitochondrial outer membrane with the ER. Multiple functions have been linked to ERMES, including maintenance of mitochondrial morphology, protein assembly and phospholipid homeostasis. Since the mitochondrial distribution and morphology protein Mdm10 is present in both ERMES and the mitochondrial sorting and assembly machinery (SAM), it is unknown how the ERMES functions are connected on a molecular level. Here we report that conserved surface areas on opposite sides of the Mdm10 β-barrel interact with SAM and ERMES, respectively. We generated point mutants to separate protein assembly (SAM) from morphology and phospholipid homeostasis (ERMES). Our study reveals that the β-barrel channel of Mdm10 serves different functions. Mdm10 promotes the biogenesis of α-helical and β-barrel proteins at SAM and functions as integral membrane anchor of ERMES, demonstrating that SAM-mediated protein assembly is distinct from ER-mitochondria contact sites.
In this paper, we study the limit of compactness which is a graph index originally introduced for measuring structural characteristics of hypermedia. Applying compactness to large scale small-world graphs (Mehler, 2008) observed its limit behaviour to be equal 1. The striking question concerning this finding was whether this limit behaviour resulted from the specifics of small-world graphs or was simply an artefact. In this paper, we determine the necessary and sufficient conditions for any sequence of connected graphs resulting in a limit value of CB = 1 which can be generalized with some consideration for the case of disconnected graph classes (Theorem 3). This result can be applied to many well-known classes of connected graphs. Here, we illustrate it by considering four examples. In fact, our proof-theoretical approach allows for quickly obtaining the limit value of compactness for many graph classes sparing computational costs.
Virtual machines are for the most part not used inside of high-energy physics (HEP) environments. Even though they provide a high degree of isolation, the performance overhead they introduce is too great for them to be used. With the rising number of container technologies and their increasing separation capabilities, HEP-environments are evaluating if they could utilize the technology. The container images are small and self-contained which allows them to be easily distributed throughout the global environment. They also offer a near native performance while at the same time aproviding an often acceptable level of isolation. Only the needed services and libraries are packed into an image and executed directly by the host kernel. This work compared the performance impact of the three container technologies Docker, rkt and Singularity. The host kernel was additionally hardened with grsecurity and PaX to strengthen its security and make an exploitation from inside a container harder. The execution time of a physics simulation was used as a benchmark. The results show that the different container technologies have a different impact on the performance. The performance loss on a stock kernel is small; in some cases they were even faster than no container. Docker showed overall the best performance on a stock kernel. The difference on a hardened kernel was bigger than on a stock kernel, but in favor of the container technologies. rkt showed performed in almost all cases better than all the others.
Unter Web-based Trainings (WBTs) versteht man multimediale, interaktive und thematisch abgeschlossene Lerneinheiten in einem Browser. Seit der Entstehung des Internets in den 1990er Jahren sind diese ein wichtiger und etablierter Baustein bei der Konzeption und Entwicklung von eLearning-Szenarien. Diese Lerneinheiten werden üblicherweise von Lehrenden mit entsprechenden Autorensystemen erstellt. In selteneren Fällen handelt es sich bei deren Umsetzungen um individuell programmierte Einzellösungen. Betrachtet man WBTs aus der Sicht der Lernenden, dann lässt sich feststellen, dass zunehmend auch nicht explizit als Lerneinheiten erstellte Inhalte genutzt werden, die jedoch genau den Bedürfnissen des jeweiligen Lernenden entsprechen (im Rahmen des informellen und selbstgesteuerten Lernens). Zum einen liegt das an der zunehmenden Verfügbarkeit und Vielfalt von „alternativen Lerninhalten“ im Internet generell (freie Lizenzen und innovative Autorentools). Zum anderen aber auch an der Möglichkeit, diese Inhalte von überall aus und zu jeder Zeit einfach finden zu können (mobiles Internet, Suchmaschinen und Sprachassistenten) bzw. eingeordnet und empfohlen zu bekommen (Empfehlungssysteme und soziale Medien).
Aus dieser Veränderung heraus ergibt sich im Rahmen dieser Dissertation die zentrale Fragestellung, ob das Konzept eines dedizierten WBT-Autorensystems den neuen Anforderungen von frei verfügbaren, interaktiven Lerninhalten (Khan Academy, YouTube und Wikipedia) und einer Vielzahl ständig wachsender und kostenfreier Autorentools für beliebige Web-Inhalte (H5P, PowToon oder Pageflow) überhaupt noch gerecht wird und wo in diesem Fall genau die Alleinstellungsmerkmale eines WBTs liegen?
Zur Beantwortung dieser Frage beschäftigt sich die Arbeit grundlegend mit dem Begriff „Web-based Training“, den über die Zeit geänderten Rahmenbedingungen und den daraus resultierenden Implikationen für die Entwicklung von WBT-Autorensystemen. Mittels des gewählten Design-based Research (DBR)-Ansatzes konnte durch kontinuierliche Zyklen von Gestaltung, Durchführung, Analyse und Re-Design am Beispiel mehrerer eLearning-Projekte der Begriff WBT neudefininiert bzw. reinterpretiert werden, so dass sich der Fokus der Definition auf das konzentriert, was WBTs im Vergleich zu anderen Inhalten und Funktionen im Internet im Kern unterscheidet: dem Lehr-/Lernaspekt (nachfolgend Web-based Training 2.0 (WBT 2.0)).
Basierend auf dieser Neudefinition konnten vier Kernfunktionalitäten ausgearbeitet werden, die die zuvor genannten Herausforderungen adressieren und in Form eines Design Frameworks detailliert beschreiben. Untersucht und entwickelt wurden die unterschiedlichen Aspekte und Funktionen der WBTs 2.0 anhand der iterativen „Meso-Zyklen“ des DBR-Ansatzes, wobei jedes der darin durchgeführten Projekte auch eigene Ergebnisse mit sich bringt, welche jeweils unter didaktischen und vor allem aber technischen Gesichtspunkten erörtert wurden. Die dadurch gewonnenen Erkenntnisse flossen jeweils in den Entwicklungsprozess der LernBar ein („Makro-Zyklus“), ein im Rahmen dieser Arbeit und von studiumdigitale, der zentralen eLearning-Einrichtung der Goethe-Universität, entwickeltes WBT-Autorensystem. Dabei wurden die Entwicklungen kontinuierlich unter Einbezug von Nutzerfeedbacks (jährliche Anwendertreffen, Schulungen, Befragungen, Support) überprüft und weiterentwickelt.
Abschließend endet der letzte Entwicklungszyklus des DBR-Ansatzes mit der Konzeption und Umsetzung von drei WBT 2.0-Systemkomponenten, wodurch sich flexibel beliebige Web-Inhalte mit entsprechenden WBT 2.0-Funktionalitäten erweitern lassen, um auch im Kontext von offenen Lehr-/Lernprozessen durchgeführte Aktivitäten transparent, nachvollziehbar und somit überprüfbar zu machen (Constructive Alignment).
Somit bietet diese Forschungsarbeit einen interdisziplinären, nutzerzentrierten und in der Praxis erprobten Ansatz für die Umsetzung und den Einsatz von WBTs im Kontext offener Lehr-/Lernprozesse. Dabei verschiebt sich der bisherige Fokus von der reinen Medienproduktion hin zu einem ganzheitlichen Ansatz, bei dem der Lehr-/Lernaspekt im Vordergrund steht (Lernbedarf erkennen, decken und überprüfen). Entscheidend ist dabei, dass zum Decken eines Lernbedarfs sämtliche zur Verfügung stehenden Ressourcen des Internets genutzt werden können, wobei WBTs 2.0 dazu lediglich den didaktischen Prozess definieren und diesen für die Lehrenden und Lernende transparent und zugänglich machen.
WBTs 2.0 profitieren dadurch zukünftig von der zunehmenden Vielfalt und Verfügbarkeit von Inhalten und Funktionen im Internet und ermöglichen es, den Entwicklern von WBT 2.0-Autorensystemen sich auf das Wesentliche zu konzentrieren: den Lehr-/Lernprozess.
Die vorliegende Arbeit lässt sich in den Bereich Data Science einordnen. Data Science verwendet Verfahren aus dem Bereich Computer Science, Algorithmen aus der Mathematik und Statistik sowie Domänenwissen, um große Datenmengen zu analysieren und neue Erkenntnisse zu gewinnen. In dieser Arbeit werden verschiedene Forschungsbereiche aus diesen verwendet. Diese umfassen die Datenanalyse im Bereich von Big Data (soziale Netzwerke, Kurznachrichten von Twitter), Opinion Mining (Analyse von Meinungen auf Basis eines Lexikons mit meinungstragenden Phrasen) sowie Topic Detection (Themenerkennung)....
Ergebnis 1: Sentiment Phrase List (SePL)
Im Forschungsbereich Opinion Mining spielen Listen meinungstragender Wörter eine wesentliche Rolle bei der Analyse von Meinungsäußerungen. Das im Rahmen dieser Arbeit entwickelte Vorgehen zur automatisierten Generierung einer solchen Liste leistet einen wichtigen Forschungsbeitrag in diesem Gebiet. Der neuartige Ansatz ermöglicht es einerseits, dass auch Phrasen aus mehreren Wörtern (inkl. Negationen, Verstärkungs- und Abschwächungspartikeln) sowie Redewendungen enthalten sind, andererseits werden die Meinungswerte aller Phrasen auf Basis eines entsprechenden Korpus automatisiert berechnet. Die Sentiment Phrase List sowie das Vorgehen wurden veröffentlicht und können von der Forschungsgemeinde genutzt werden [121, 123]. Die Erstellung basiert auf einer textuellen sowie zusätzlich numerischen Bewertung, welche typischerweise in Kundenrezensionen verwendet werden (beispielsweise der Titel und die Sternebewertung bei Amazon Kundenrezensionen). Es können weitere Datenquellen verwendet werden, die eine derartige Bewertung aufweisen. Auf Basis von ca. 1,5 Millionen deutschen Kundenrezensionen wurden verschiedene Versionen der SePL erstellt und veröffentlicht [120].
Ergebnis 2: Algorithmus auf Basis der SePL
Mit Hilfe der SePL und den darin enthaltenen meinungstragenden Phrasen ergeben sich Verbesserungen für lexikonbasierte Verfahren bei der Analyse von Meinungsäußerungen. Phrasen werden im Text häufig durch andere Wörter getrennt, wodurch eine Identifizierung der Phrasen erforderlich ist. Der Algorithmus für eine lexikonbasierte Meinungsanalyse wurde veröffentlicht [176]. Er basiert auf meinungstragenden Phrasen bestehend aus einem oder mehreren Wörtern. Da für einzelne Phrasen unterschiedliche Meinungswerte vorliegen, ist eine genauere Bewertung als mit bisherigen Ansätzen möglich. Dies ermöglicht, dass meinungstragende Phrasen aus dem Text extrahiert und anhand der in der SePL enthaltenen Einträge differenziert bewertet werden können. Bisherige Ansätze nutzen häufig einzelne meinungstragende Wörter. Der Meinungswert für beispielsweise eine Verneinung muss nicht anhand eines generellen Vorgehens erfolgen. In aktuellen Verfahren wird der Wert eines meinungstragenden Wortes bei Vorhandensein einer Verneinung bisher meist invertiert, was häufig falsche Ergebnisse liefert. Die Liste enthält im besten Fall sowohl einen Meinungswert für das einzelne Wort und seine Verneinung (z.B. „schön“ und „nicht schön“).
1.3 übersicht der hauptergebnisse 5
Ergebnis 3: Evaluierung der Anwendung der SePL
Der Algorithmus aus Ergebnis 2 wurde mit Rezensionen der Bewertungsplattform CiaoausdemBereichderAutomobilversicherunge valuiert.Dabei wurden wesentliche Fehlerquellen aufgezeigt [176], die entsprechende Verbesserungen ermöglichen. Weiterhin wurde mit der SePL eine Evaluation anhand eines Maschinenlernverfahrens auf Basis einer Support Vector Machine durchgeführt. Hierbei wurden verschiedene bestehende lexikalische Ressourcen mit der SePL verglichen sowie deren Einsatz in verschiedenen Domänen untersucht. Die Ergebnisse wurden in [115] veröffentlicht.
Ergebnis 4: Forschungsprojekt PoliTwi - Themenerkennung politischer Top-Themen
Mit dem Forschungsprojekt PoliTwi wurden einerseits die erforderlichen Daten von Twitter gesammelt. Andererseits werden der breiten Öffentlichkeit fortlaufend aktuelle politische Top-Themen über verschiedene Kanäle zur Verfügung gestellt. Für die Evaluation der angestrebten Verbesserungen im Bereich der Themenerkennung in Verbindung mit einer Meinungsanalyse liegen die erforderlichen Daten über einen Zeitraum von bisher drei Jahren aus der Domäne Politik vor. Auf Basis dieser Daten konnte die Themenerkennung durchgeführt werden. Die berechneten Themen wurden mit anderen Systemen wie Google Trends oder Tagesschau Meta verglichen (siehe Kapitel 5.3). Es konnte gezeigt werden, dass die Meinungsanalyse die Themenerkennung verbessern kann. Die Ergebnisse des Projekts wurden in [124] veröffentlicht. Der Öffentlichkeit und insbesondere Journalisten und Politikern wird zudem ein Service (u.a. anhand des Twitter-Kanals unter https://twitter.com/politwi) zur Verfügung gestellt, anhand dessen sie über aktuelle Top-Themen informiert werden. Nachrichtenportale wie FOCUS Online nutzten diesen Service bei ihrer Berichterstattung (siehe Kapitel 4.3.6.1). Die Top-Themen werden seit Mitte 2013 ermittelt und können zudem auf der Projektwebseite [119] abgerufen werden.
Ergebnis 5: Erweiterung lexikalischer Ressourcen auf Konzeptebene
Das noch junge Forschungsgebiet des Concept-level Sentiment Analysis versucht bisherige Ansätze der Meinungsanalyse dadurch zu verbessern, dass Meinungsäußerungen auf Konzeptebene analysiert werden. Eine Voraussetzung sind Listen meinungstragender Wörter, welche differenzierte Betrachtungen anhand unterschiedlicher Kontexte ermöglichen. Anhand der Top-Themen und deren Kontext wurde ein Vorgehen entwickelt, welches die Erstellung bzw. Ergänzung dieser Listen ermöglicht. Es wurde gezeigt, wie Meinungen in unterschiedlichen Kontexten differenziert bewertet werden und diese Information in lexikalischen Ressourcen aufgenommen werden können, was im Bereich der Concept-level Sentiment Analysis genutzt werden kann. Das Vorgehen wurde in [124] veröffentlicht.
Students of computer science studies enter university education with very different competencies, experience and knowledge. 145 datasets collected of freshmen computer science students by learning management systems in relation to exam outcomes and learning dispositions data (e. g. student dispositions, previous experiences and attitudes measured through self-reported surveys) has been exploited to identify indicators as predictors of academic success and hence make effective interventions to deal with an extremely heterogeneous group of students.
Die 8. Fachtagung für Hochschuldidaktik der Informatik (HDI) fand im September 2018 zusammen mit der Deutschen E-Learning Fachtagung Informatik (DeLFI) unter dem gemeinsamen Motto "Digitalisierungswahnsinn? – Wege der Bildungstransformationen" in Frankfurt statt.
Dabei widmet sich die HDI allen Fragen der informatischen Bildung im Hochschulbereich. Schwerpunkte bildeten in diesem Jahr u. a.:
– Analyse der Inhalte und anzustrebenden Kompetenzen in Informatikveranstaltungen
– Programmieren lernen & Einstieg in Softwareentwicklung
– Spezialthemen: Data Science, Theoretische Informatik und Wissenschaftliches Arbeiten
Die Fachtagung widmet sich ausgewählten Fragestellungen dieser Themenkomplexe, die durch Vorträge ausgewiesener Experten und durch eingereichte Beiträge intensiv behandelt werden.
Background: Although mortality after cardiac surgery has significantly decreased in the last decade, patients still experience clinically relevant postoperative complications. Among others, atrial fibrillation (AF) is a common consequence of cardiac surgery, which is associated with prolonged hospitalization and increased mortality.
Methods: We retrospectively analyzed data from patients who underwent coronary artery bypass grafting, valve surgery or a combination of both at the University Hospital Muenster between April 2014 and July 2015. We evaluated the incidence of new onset and intermittent/permanent AF (patients with pre- and postoperative AF). Furthermore, we investigated the impact of postoperative AF on clinical outcomes and evaluated potential risk factors.
Results: In total, 999 patients were included in the analysis. New onset AF occurred in 24.9% of the patients and the incidence of intermittent/permanent AF was 59.5%. Both types of postoperative AF were associated with prolonged ICU length of stay (median increase approx. 2 days) and duration of mechanical ventilation (median increase 1 h). Additionally, new onset AF patients had a higher rate of dialysis and hospital mortality and more positive fluid balance on the day of surgery and postoperative days 1 and 2. In a multiple logistic regression model, advanced age (odds ratio (OR) = 1.448 per decade increase, p < 0.0001), a combination of CABG and valve surgery (OR = 1.711, p = 0.047), higher C-reactive protein (OR = 1.06 per unit increase, p < 0.0001) and creatinine plasma concentration (OR = 1.287 per unit increase, p = 0.032) significantly predicted new onset AF. Higher Horowitz index values were associated with a reduced risk (OR = 0.996 per unit increase, p = 0.012). In a separate model, higher plasma creatinine concentration (OR = 2.125 per unit increase, p = 0.022) was a significant risk factor for intermittent/permanent AF whereas higher plasma phosphate concentration (OR = 0.522 per unit increase, p = 0.003) indicated reduced occurrence of this arrhythmia.
Conclusions: New onset and intermittent/permanent AF are associated with adverse clinical outcomes of elective cardiac surgery patients. Different risk factors implicated in postoperative AF suggest different mechanisms might be involved in its pathogenesis. Customized clinical management protocols seem to be warranted for a higher success rate of prevention and treatment of postoperative AF.
The transverse momentum distributions of the strange and double-strange hyperon resonances (Σ(1385)±,Ξ(1530)0) produced in p–Pb collisions at √sNN = 5.02 TeV were measured in the rapidity range −0.5<yCMS<0 for event classes corresponding to different charged-particle multiplicity densities, ⟨dNch/dηlab⟩. The mean transverse momentum values are presented as a function of ⟨dNch/dηlab⟩, as well as a function of the particle masses and compared with previous results on hyperon production. The integrated yield ratios of excited to ground-state hyperons are constant as a function of ⟨dNch/dηlab⟩. The equivalent ratios to pions exhibit an increase with ⟨dNch/dηlab⟩, depending on their strangeness content.
Motivation: Arabidopsis thaliana is a well-established model system for the analysis of the basic physiological and metabolic pathways of plants. Nevertheless, the system is not yet fully understood, although many mechanisms are described, and information for many processes exists. However, the combination and interpretation of the large amount of biological data remain a big challenge, not only because data sets for metabolic paths are still incomplete. Moreover, they are often inconsistent, because they are coming from different experiments of various scales, regarding, for example, accuracy and/or significance. Here, theoretical modeling is powerful to formulate hypotheses for pathways and the dynamics of the metabolism, even if the biological data are incomplete. To develop reliable mathematical models they have to be proven for consistency. This is still a challenging task because many verification techniques fail already for middle-sized models. Consequently, new methods, like decomposition methods or reduction approaches, are developed to circumvent this problem.
Methods: We present a new semi-quantitative mathematical model of the metabolism of Arabidopsis thaliana. We used the Petri net formalism to express the complex reaction system in a mathematically unique manner. To verify the model for correctness and consistency we applied concepts of network decomposition and network reduction such as transition invariants, common transition pairs, and invariant transition pairs.
Results: We formulated the core metabolism of Arabidopsis thaliana based on recent knowledge from literature, including the Calvin cycle, glycolysis and citric acid cycle, glyoxylate cycle, urea cycle, sucrose synthesis, and the starch metabolism. By applying network decomposition and reduction techniques at steady-state conditions, we suggest a straightforward mathematical modeling process. We demonstrate that potential steady-state pathways exist, which provide the fixed carbon to nearly all parts of the network, especially to the citric acid cycle. There is a close cooperation of important metabolic pathways, e.g., the de novo synthesis of uridine-5-monophosphate, the γ-aminobutyric acid shunt, and the urea cycle. The presented approach extends the established methods for a feasible interpretation of biological network models, in particular of large and complex models.
This volume contains the papers presented at the First International Workshop on Rewriting Techniques for Program Transformations and Evaluation (WPTE 2014) which was held on July 13, 2014 in Vienna, Austria during the Vienna Summer of Logic 2014 (VSL 2014) as a workshop of the Sixth Federated Logic Conference (FLoC 2014). WPTE 2014 was affiliated with the 25th International Conference on Rewriting Techniques and Applications joined with the 12th International Conference on Typed Lambda Calculi and Applications (RTA/TLCA 2014).
Background: Signal transduction pathways are important cellular processes to maintain the cell’s integrity. Their imbalance can cause severe pathologies. As signal transduction pathways feature complex regulations, they form intertwined networks. Mathematical models aim to capture their regulatory logic and allow an unbiased analysis of robustness and vulnerability of the signaling network. Pathway detection is yet a challenge for the analysis of signaling networks in the field of systems biology. A rigorous mathematical formalism is lacking to identify all possible signal flows in a network model.
Results: In this paper, we introduce the concept of Manatee invariants for the analysis of signal transduction networks. We present an algorithm for the characterization of the combinatorial diversity of signal flows, e.g., from signal reception to cellular response. We demonstrate the concept for a small model of the TNFR1-mediated NF- κB signaling pathway. Manatee invariants reveal all possible signal flows in the network. Further, we show the application of Manatee invariants for in silico knockout experiments. Here, we illustrate the biological relevance of the concept.
Conclusions: The proposed mathematical framework reveals the entire variety of signal flows in models of signaling systems, including cyclic regulations. Thereby, Manatee invariants allow for the analysis of robustness and vulnerability of signaling networks. The application to further analyses such as for in silico knockout was shown. The new framework of Manatee invariants contributes to an advanced examination of signaling systems.
Exploring biophysical properties of virus-encoded components and their requirement for virus replication is an exciting new area of interdisciplinary virological research. To date, spatial resolution has only rarely been analyzed in computational/biophysical descriptions of virus replication dynamics. However, it is widely acknowledged that intracellular spatial dependence is a crucial component of virus life cycles. The hepatitis C virus-encoded NS5A protein is an endoplasmatic reticulum (ER)-anchored viral protein and an essential component of the virus replication machinery. Therefore, we simulate NS5A dynamics on realistic reconstructed, curved ER surfaces by means of surface partial differential equations (sPDE) upon unstructured grids. We match the in silico NS5A diffusion constant such that the NS5A sPDE simulation data reproduce experimental NS5A fluorescence recovery after photobleaching (FRAP) time series data. This parameter estimation yields the NS5A diffusion constant. Such parameters are needed for spatial models of HCV dynamics, which we are developing in parallel but remain qualitative at this stage. Thus, our present study likely provides the first quantitative biophysical description of the movement of a viral component. Our spatio-temporal resolved ansatz paves new ways for understanding intricate spatial-defined processes central to specfic aspects of virus life cycles.
Automated deduction in higher-order program calculi, where properties of transformation rules are demanded, or confluence or other equational properties are requested, can often be done by syntactically computing overlaps (critical pairs) of reduction rules and transformation rules. Since higher-order calculi have alpha-equivalence as fundamental equivalence, the reasoning procedure must deal with it. We define ASD1-unification problems, which are higher-order equational unification problems employing variables for atoms, expressions and contexts, with additional distinct-variable constraints, and which have to be solved w.r.t. alpha-equivalence. Our proposal is to extend nominal unification to solve these unification problems. We succeeded in constructing the nominal unification algorithm NomUnifyASC. We show that NomUnifyASC is sound and complete for these problem class, and outputs a set of unifiers with constraints in nondeterministic polynomial time if the final constraints are satisfiable. We also show that solvability of the output constraints can be decided in NEXPTIME, and for a fixed number of context-variables in NP time. For terms without context-variables and atom-variables, NomUnifyASC runs in polynomial time, is unitary, and extends the classical problem by permitting distinct-variable constraints.
1998 ACM Subject Classification F.4.1 Mathematical Logic
We propose a model for measuring the runtime of concurrent programs by the minimal number of evaluation steps. The focus of this paper are improvements, which are program transformations that improve this number in every context, where we distinguish between sequential and parallel improvements, for one or more processors, respectively. We apply the methods to CHF, a model of Concurrent Haskell extended by futures. The language CHF is a typed higher-order functional language with concurrent threads, monadic IO and MVars as synchronizing variables. We show that all deterministic reduction rules and 15 further program transformations are sequential and parallel improvements. We also show that introduction of deterministic parallelism is a parallel improvement, and its inverse a sequential improvement, provided it is applicable. This is a step towards more automated precomputation of concurrent programs during compile time, which is also formally proven to be correctly optimizing.
Data structures and advanced models of computation on big data : report from Dagstuhl seminar 14091
(2014)
This report documents the program and the outcomes of Dagstuhl Seminar 14091 "Data Structures and Advanced Models of Computation on Big Data". In today's computing environment vast amounts of data are processed, exchanged and analyzed. The manner in which information is stored profoundly influences the efficiency of these operations over the data. In spite of the maturity of the field many data structuring problems are still open, while new ones arise due to technological advances.
The seminar covered both recent advances in the "classical" data structuring topics as well as new models of computation adapted to modern architectures, scientific studies that reveal the need for such models, applications where large data sets play a central role, modern computing platforms for very large data, and new data structures for large data in modern architectures.
The extended abstracts included in this report contain both recent state of the art advances and lay the foundation for new directions within data structures research.
Synaptic release sites are characterized by exocytosis-competent synaptic vesicles tightly anchored to the presynaptic active zone (PAZ) whose proteome orchestrates the fast signaling events involved in synaptic vesicle cycle and plasticity. Allocation of the amyloid precursor protein (APP) to the PAZ proteome implicated a functional impact of APP in neuronal communication. In this study, we combined state-of-the-art proteomics, electrophysiology and bioinformatics to address protein abundance and functional changes at the native hippocampal PAZ in young and old APP-KO mice. We evaluated if APP deletion has an impact on the metabolic activity of presynaptic mitochondria. Furthermore, we quantified differences in the phosphorylation status after long-term-potentiation (LTP) induction at the purified native PAZ. We observed an increase in the phosphorylation of the signaling enzyme calmodulin-dependent kinase II (CaMKII) only in old APP-KO mice. During aging APP deletion is accompanied by a severe decrease in metabolic activity and hyperphosphorylation of CaMKII. This attributes an essential functional role to APP at hippocampal PAZ and putative molecular mechanisms underlying the age-dependent impairments in learning and memory in APP-KO mice.
Random graph models, originally conceived to study the structure of networks and the emergence of their properties, have become an indispensable tool for experimental algorithmics. Amongst them, hyperbolic random graphs form a well-accepted family, yielding realistic complex networks while being both mathematically and algorithmically tractable. We introduce two generators MemGen and HyperGen for the G_{alpha,C}(n) model, which distributes n random points within a hyperbolic plane and produces m=n*d/2 undirected edges for all point pairs close by; the expected average degree d and exponent 2*alpha+1 of the power-law degree distribution are controlled by alpha>1/2 and C. Both algorithms emit a stream of edges which they do not have to store. MemGen keeps O(n) items in internal memory and has a time complexity of O(n*log(log n) + m), which is optimal for networks with an average degree of d=Omega(log(log n)). For realistic values of d=o(n / log^{1/alpha}(n)), HyperGen reduces the memory footprint to O([n^{1-alpha}*d^alpha + log(n)]*log(n)). In an experimental evaluation, we compare HyperGen with four generators among which it is consistently the fastest. For small d=10 we measure a speed-up of 4.0 compared to the fastest publicly available generator increasing to 29.6 for d=1000. On commodity hardware, HyperGen produces 3.7e8 edges per second for graphs with 1e6 < m < 1e12 and alpha=1, utilising less than 600MB of RAM. We demonstrate nearly linear scalability on an Intel Xeon Phi.
Interest to become a data scientist or related professions in data science domain is rapidly growing. To meet such a demand, we propose a novel educational service that aims to provide tailored learning paths for data science. Our target user is one who aims to be an expert in data science. Our approach is to analyze the background of the practitioner and match the learning units. A critical feature is that we use gamification to reinforce the practitioner engagement. We believe that our work provides a practical guideline for those who want to learn data science.
Exhaustive, automatic testing of dataflow (esp. mapreduce) programs has emerged as an important challenge. Past work demonstrated effective ways to generate small example data sets that exercise operators in the Pig platform, used to generate Hadoop map-reduce programs. Although such prior techniques attempt to cover all cases of operator use, in practice they often fail. Our SEDGE system addresses these completeness problems: for every dataflow operator, we produce data aiming to cover all cases that arise in the dataflow program (e.g., both passing and failing a filter). SEDGE relies on transforming the program into symbolic constraints, and solving the constraints using a symbolic reasoning engine (a powerful SMT solver), while using input data as concrete aids in the solution process. The approach resembles dynamic-symbolic (a.k.a. "concolic") execution in a conventional programming language, adapted to the unique features of the dataflow domain.
In third-party benchmarks, SEDGE achieves higher coverage than past techniques for 5 out of 20 PigMix benchmarks and 7 out of 11 SDSS benchmarks and (with equal coverage for the rest of the benchmarks). We also show that our targeting of the high-level dataflow language pays off: for complex programs, state-of-the-art dynamic-symbolic execution at the level of the generated map-reduce code (instead of the original dataflow program) requires many more test cases or achieves much lower coverage than our approach.
The presented work inside this thesis aims to raise the degree of automation in analog circuit design. Therefore, a framework was developed to provide the necessary mechanisms in order to carry out a fully automated analog circuit synthesis, i.e., the construction of an analog circuit fulfilling all previously defined (electrical) specifications. Nowadays, analog circuit design in general is a very time consuming process compared to a digital design flow. Due to its discrete nature, the digital design process is highly automated and thus very efficient compared to analog circuit design. In modern Very-Large-Scale integration (VLSI) circuits the analog parts are mostly just a small portion of the overall chip area. Although this small portion is known to consume a major part of the needed workforce. Paired with product cycles which constantly get shorter, the time needed to develop the analog parts of an integrated circuit (IC) becomes a determinant factor. Apart from this, the ongoing progress in semiconductor processing technologies promises more speed with less power consumption on smaller areas, forcing the IC developers to keep track with the technology nodes in order to maintain competitiveness. Analog circuitry exhibits the inherent property of being hard to reuse, as porting from one technology node to another imposes critical changes for operating conditions (e.g., supply voltage) - mostly leading to a full redesign for most of the analog modules. This productivity gap between digital and analog design resembles the primary motivation for this thesis. Due to the availability of commercial sizing tools, this work deliberately focuses on the construction of circuit topologies in distinction to parameter synthesis, which can be obtained with a dedicated sizing tool. The focus on circuit construction allows the development of a framework which allows a full design space exploration. This thesis describes the needed concepts and methods to realize a deterministic, explorative analog synthesis framework. Despite this, a reference implementation is presented, which demonstrates the applicability in current analog design flows.
This paper describes the ongoing efforts of the authors to present ancient Greek and Roman numismatic data on the public internet, with an emphasis on efforts to integrate information from multiple sources using Linked Data and Semantic Web techniques. By way of very modern metaphor, it is useful to think of coins as intentionally created packages of 'named entities'. Each coin was struck by a particular authority, often at a known site, and coins often make reference to familiar concepts such as deities, historical events, or symbols that were widely recognized in the ancient world. The institutions represented among the authors have deployed search interfaces that allow users to take advantage of this aspect of numismatic databases. The American Numismatic Society's database provides faceted search to its collection of over 550,000 objects. The Portable Antiquities Scheme (PAS) in the UK presents individual finds (and hoards) recorded throughout the country. The Römisch-Germanische Kommission and the University of Frankfurt (DBIS) are developing a prototype metaportal (INTERFACE) that accesses national databases of coin finds held in in Frankfurt, Vienna and Utrecht. Each of these resources is beginning to explore Semantic Web/Linked data approaches so that the role of numismatic standards is immediately coming to the fore. DBIS and INTERFACE are developing a numismatic ontology. At the ANS and PAS, the public database already presents RDF serializations based on Dublin Core. Together, the authors have begun to explore standardization of conceptual names on the basis of the vocabulary presented at the site http://nomisma.org . Nomisma.org is a collaborative effort to provide stable digital representations of numismatic concepts and entities. It provides URIs for such basic concepts as 'coin', 'mint', 'axis'. All of these are defined within the scope of numismatics but are already being linked to other stable resources where available. This is particularly the case for mints. For example, the URI http://nomisma.org/id/corinth is intended to represent that ancient city in its role as a minter/issuer of coins. The URI is linked via the SKOS ontology to the Pleiades Gazetteer of ancient places. This allows Nomisma to be the basis for a common representation of the concept that an object is a coin minted at Corinth. The ANS has already deployed such relationships in its public database. The work of all these projects is very much in progress so that this paper hopes to generate discussion on how multiple large projects can move forward in their own work while encouraging sufficient commonality to support large scale research questions undertaken by diverse audiences.
Spam detection in wikis
(2012)
Wikis haben durch ihre kollaborativen Eigenschaften maßgeblich an der Entstehung des Web 2.0 beigetragen: Durch die Zusammenarbeit vieler Benutzer ist es möglich geworden, große Mengen an Daten aufzubereiten und strukturiert zusammenzustellen. So ist ein Datenschatz angewachsen, der wertvoll für die maschinelle Verarbeitung von Text ist: Mittels der Techniken des TextMining lassen sich aus Wikis viele Informationen extrahieren. Dazu ist es zunächst sinnvoll, deren Inhalte herunterzuladen und lokal zu speichern.
Zum Editieren von Seiten existieren häufig keine Zugangsbeschränkungen. So wird die genannte Akkumulation von Informationen ermöglicht, da sich viele Benutzer beteiligen können. Jedoch birgt dies die Gefahr, dass Wikis durch Spam verunreinigt werden: Zur Verwendung als Wissensbasis ist dies hinderlich.
Gängige Anti-Spam-Maßnahmen finden online statt und setzen unter anderem auf die Überwachung durch die Nutzer oder den Einsatz von Blacklists für Weblinks. Im Gegensatz dazu wird im Rahmen dieser Arbeit folgender Ansatz gewählt: Ein lokal gespeichertes Wiki wird einer Bestandsaufnahme unterzogen und in seiner Gesamtheit untersucht. Es werden ausschließlich die Inhalte der Seiten berücksichtigt. Die Spam-Erkennung beruht auf einer Kombination von Entscheidungsregeln sowie der Berücksichtigung von Wortwahrscheinlichkeiten. Dadurch konnten gute Ergebnisse erzielt werden.
We study Gaifman locality and Hanf locality of an extension of first-order logic with modulo p counting quantifiers (FO+MODp , for short) with arbitrary numerical predicates. We require that the validity of formulas is independent of the particular interpretation of the numerical predicates and refer to such formulas as arb-invariant formulas. This paper gives a detailed picture of locality and non-locality properties of arb-invariant FO+MODp . For example, on the class of all finite structures, for any p 2, arb-invariant FO+MODp is neither Hanf nor Gaifman local with respect to a sublinear locality radius. However, in case that p is an odd prime power, it is weakly Gaifman local with a polylogarithmic locality radius. And when restricting attention to the class of string structures, for odd prime powers p, arb-invariant FO+MODp is both Hanf and Gaifman local with a polylogarithmic locality radius. Our negative results build on examples of order-invariant FO+MODp formulas presented in Niemist ̈o’s PhD thesis. Our positive results make use of the close connection between FO+MODp and Boolean circuits built from NOT-gates and AND-, OR-, and MOD p - gates of arbitrary fan-in.
We consider the isolated spelling error correction problem as a specific subproblem of the more general string-to-string translation problem. In this context, we investigate four general string-to-string transformation models that have been suggested in recent years and apply them within the spelling error correction paradigm. In particular, we investigate how a simple ‘k-best decoding plus dictionary lookup’ strategy performs in this context and find that such an approach can significantly outdo baselines such as edit distance, weighted edit distance, and the noisy channel Brill and Moore model to spelling error correction. We also consider elementary combination techniques for our models such as language model weighted majority voting and center string combination. Finally, we consider real-world OCR post-correction for a dataset sampled from medieval Latin texts.
Research in the field of Digital Humanities, also known as Humanities Computing, has seen a steady increase over the past years. Situated at the intersection of computing science and the humanities, present efforts focus on making resources such as texts, images, musical pieces and other semiotic artifacts digitally available, searchable and analysable. To this end, computational tools enabling textual search, visual analytics, data mining, statistics and natural language processing are harnessed to support the humanities researcher. The processing of large data sets with appropriate software opens up novel and fruitful approaches to questions in the traditional humanities. This report summarizes the Dagstuhl seminar 14301 on “Computational Humanities - bridging the gap between Computer Science and Digital Humanities”.
1998 ACM Subject Classification I.2.7 Natural Language Processing, J.5 Arts and Humanities
In order to promote the accessibility of biodiversity data in historic and contemporary literature, we introduce a new interdisciplinary project called BIOfid (FID=Fachinformationsdienst, a service for providing specialized information). The project aims at a mobilization of data available in print only by combining digitization of scientific biodiversity literature with the development of innovative text mining tools for complex, eventually semantic searches throughout the complete text corpus. A major prerequisite for the development of such search tools is the provision of sophisticated anatomy ontologies on the one hand, and of complete lists of species names (currently considered valid as well as all synonyms) at a global scale on the other hand. In the initial stage, we chose examples from German publications of the past 250 years dealing with the geographic distribution and ecology of vascular plants (Tracheophyta), birds (Aves), as well as moths and butterflies (Lepidoptera) in Germany. These taxa have been prioritized according to current demands of German research groups (about 50 sites) aiming at analyses and modeling of distribution patterns and their changes through time. In the long term, we aim at providing data and open source software applicable for any taxon and geographic region. For this purpose, a platform for open access journals for long-term availability of professional e-journals will be established. All generated data will also be made accessible through GFBio (German Federation for Biological Data). BIOfid is supported by the LIS-Scientific Library Services and Information Systems program of the German Research Foundation (DFG).
We present results on transverse momentum (pT) and rapidity (y) differential production cross sections, mean transverse momentum and mean transverse momentum square of inclusive J/ψ and ψ(2S) at forward rapidity (2.5 < y < 4) as well as ψ(2S)-to-J/ψ cross section ratios. These quantities are measured in pp collisions at center of mass energies s√=5.02 and 13 TeV with the ALICE detector. Both charmonium states are reconstructed in the dimuon decay channel, using the muon spectrometer. A comprehensive comparison to inclusive charmonium cross sections measured at s√=2.76, 7 and 8 TeV is performed. A comparison to non-relativistic quantum chromodynamics and fixed-order next-to-leading logarithm calculations, which describe prompt and non-prompt charmonium production respectively, is also presented. A good description of the data is obtained over the full pT range, provided that both contributions are summed. In particular, it is found that for pT > 15 GeV/c the non-prompt contribution reaches up to 50% of the total charmonium yield.
We explore space improvements in LRP, a polymorphically typed call-by-need functional core language. A relaxed space measure is chosen for the maximal size usage during an evaluation. It Abstracts from the details of the implementation via abstract machines, but it takes garbage collection into account and thus can be seen as a realistic approximation of space usage. The results are: a context lemma for space improving translations and for space equivalences; all but one reduction rule of the calculus are shown to be space improvements, and the exceptional one, the copy-rule, is shown to increase space only moderately.
Several further program transformations are shown to be space improvements or space equivalences, in particular the translation into machine expressions is a space equivalence. These results are a step Forward in making predictions about the change in runtime space behavior of optimizing transformations in callbyneed functional languages.
Motivated by tools for automaed deduction on functional programming languages and programs, we propose a formalism to symbolically represent $\alpha$-renamings for meta-expressions. The formalism is an extension of usual higher-order meta-syntax which allows to $\alpha$-rename all valid ground instances of a meta-expression to fulfill the distinct variable convention. The renaming mechanism may be helpful for several reasoning tasks in deduction systems. We present our approach for a meta-language which uses higher-order abstract syntax and a meta-notation for recursive let-bindings, contexts, and environments. It is used in the LRSX Tool -- a tool to reason on the correctness of program transformations in higher-order program calculi with respect to their operational semantics. Besides introducing a formalism to represent symbolic $\alpha$-renamings, we present and analyze algorithms for simplification of $\alpha$-renamings, matching, rewriting, and checking $\alpha$-equivalence of symbolically $\alpha$-renamed meta-expressions.
We introduce rewriting of meta-expressions which stem from a meta-language that uses higher-order abstract syntax augmented by meta-notation for recursive let, contexts, sets of bindings, and chain variables. Additionally, three kinds of constraints can be added to meta-expressions to express usual constraints on evaluation rules and program transformations. Rewriting of meta-expressions is required for automated reasoning on programs and their properties. A concrete application is a procedure to automatically prove correctness of program transformations in higher-order program calculi which may permit recursive let-bindings as they occur in functional programming languages. Rewriting on meta-expressions can be performed by solving the so-called letrec matching problem which we introduce. We provide a matching algorithm to solve it. We show that the letrec matching problem is NP-complete, that our matching algorithm is sound and complete, and that it runs in non-deterministic polynomial time.
This is a short summary of a recent survey [FR03] focusing on the observed evidence, that Internet connectivity is positively correlated with spread of democracy at high levels of significance. The results of multivariate correlation analysis and probabilities regression estimate models are based on the combined analysis of mid - 1991’s, to 2001 data series of the Eurostat’s and US Census Bureau, the World Bank, and OECD’s statistical data service which track the growth of information technology and rating of freedom and democracy worldwide.
We present an implementation of an interpreter LRPi for the call-by-need calculus LRP, based on a variant of Sestoft's abstract machine Mark 1, extended with an eager garbage collector. It is used as a tool for exact space usage analyses as a support for our investigations into space improvements of call-by-need calculi.
50 years of amino acid hydrophobicity scales : revisiting the capacity for peptide classification
(2016)
Background: Physicochemical properties are frequently analyzed to characterize protein-sequences of known and unknown function. Especially the hydrophobicity of amino acids is often used for structural prediction or for the detection of membrane associated or embedded β-sheets and α-helices. For this purpose many scales classifying amino acids according to their physicochemical properties have been defined over the past decades. In parallel, several hydrophobicity parameters have been defined for calculation of peptide properties. We analyzed the performance of separating sequence pools using 98 hydrophobicity scales and five different hydrophobicity parameters, namely the overall hydrophobicity, the hydrophobic moment for detection of the α-helical and β-sheet membrane segments, the alternating hydrophobicity and the exact ß-strand score.
Results: Most of the scales are capable of discriminating between transmembrane α-helices and transmembrane β-sheets, but assignment of peptides to pools of soluble peptides of different secondary structures is not achieved at the same quality. The separation capacity as measure of the discrimination between different structural elements is best by using the five different hydrophobicity parameters, but addition of the alternating hydrophobicity does not provide a large benefit. An in silico evolutionary approach shows that scales have limitation in separation capacity with a maximal threshold of 0.6 in general. We observed that scales derived from the evolutionary approach performed best in separating the different peptide pools when values for arginine and tyrosine were largely distinct from the value of glutamate. Finally, the separation of secondary structure pools via hydrophobicity can be supported by specific detectable patterns of four amino acids.
Conclusion: It could be assumed that the quality of separation capacity of a certain scale depends on the spacing of the hydrophobicity value of certain amino acids. Irrespective of the wealth of hydrophobicity scales a scale separating all different kinds of secondary structures or between soluble and transmembrane peptides does not exist reflecting that properties other than hydrophobicity affect secondary structure formation as well. Nevertheless, application of hydrophobicity scales allows distinguishing between peptides with transmembrane α-helices and β-sheets. Furthermore, the overall separation capacity score of 0.6 using different hydrophobicity parameters could be assisted by pattern search on the protein sequence level for specific peptides with a length of four amino acids.
Magnetoencephalography (MEG) measures neural activity non-invasively and at an excellent temporal resolution. Since its invention (Cohen, 1968, 1972), MEG has proven a most valuable tool in neurocognitive (Salmelin et al., 1994) and clinical research (Stufflebeam et al., 2009; Van ’t Ent et al., 2003). MEG is able to measure rapid changes in electrophysiological neural signals related to sensory and cognitive processes. The magnetic fields measured outside the head by MEG directly reflect the cortical currents generated by the synchronised activity of thousands of neuronal sources. This distinguishes MEG from functional magnetic resonance imaging (fMRI), where measurements are only indirectly related to electrophysiological activity through neurovascular coupling...
Die zunehmende Verbreitung des Internets als universelles Netzwerk zum Transport von Daten aller Art hat in den letzten zwei Dekaden dazu geführt, dass die anfallenden Datenmengen von traditionellen Datenbanksystemen kaum mehr effektiv zu verarbeiten sind. Das liegt zum einen darin, dass ein immer größerer Teil der Erdbevölkerung Zugang zum Internet hat, zum Beispiel via
Internet-fähigen Smartphones, und dessen Dienste nutzen möchte. Zudem tragen immer höhere verfügbare Bandbreiten für den Internetzugang dazu bei, dass die weltweit erzeugten Informationen mittlerweile exponentiell steigen.
Das führte zur Entwicklung und Implementierung von Technologien, um diese immensen Datenmengen wirksam verarbeiten zu können. Diese Technologien können unter dem Sammelbegriff "Big Data" zusammengefasst werden und beschreiben dabei Verfahren, um strukturierte und unstrukturierte Informationen im Tera- und Exabyte-Bereich sogar in Echtzeit verarbeiten zu können. Als Basis dienen dabei Datenbanksysteme, da sie ein bewährtes und praktisches Mittel sind, um Informationen zu strukturieren, zu organisieren, zu manipulieren und effektiv abrufen zu können. Wie bereits erwähnt, hat sich herausgestellt, dass traditionelle Datenbanksysteme, die auf dem relationalen Datenmodell basieren, nun mit Datenmengen konfrontiert sind, mit denen sie nicht sehr gut hinsichtlich der Performance und dem Energieverbrauch skalieren. Dieser Umstand führte zu der Entwicklung von spezialisierten Datenbanksystemen, die andere Daten- und Speichermodelle implementieren und für diese eine deutlich höhere Performance bieten.
Zusätzlich erfordern Datenbanksysteme im Umfeld von "Big Data" wesentlich größere Investitionen in die Anzahl von Servern, was dazu geführt hat, dass immer mehr große und sehr große Datenverarbeitungszentren entstanden sind. In der Zwischenzeit sind die Aufwendungen für Energie zum Betrieb und Kühlen dieser Zentren ein signifikanter Kostenfaktor geworden. Dementsprechend sind bereits Anstrengungen unternommen worden, das Themenfeld Energieeffizienz (die Relation zwischen Performance und Energieverbrauch) von Datenbanksystemen eingehender zu untersuchen.
Mittlerweile sind über 150 Datenbanksysteme bekannt, die ihre eigenen Stärken und Schwächen in Bezug auf Performance, Energieverbrauch und schlussendlich Energieeffizienz haben. Die Endanwender von Datenbanksystemen sehen sich nun in der schwierigen Situation, für einen gegebenen Anwendungsfall das geeigneteste Datenbanksystem in Hinblick auf die genannten Faktoren zu ermitteln. Der Grund dafür ist, dass kaum objektive und unabhängige Vergleichszahlen zur Entscheidungsfindung existieren und dass die Ermittlung von Vergleichszahlen zumeist über die Ausführung von Benchmarks auf verschiedensten technischen Plattformen geschieht. Es ist offensichtlich, dass die mehrfache Ausführung eines Benchmarks mit unterschiedlichsten Parametern (unter anderem die Datenmenge, andere Kombinationen aus technischen Komponenten, Betriebssystem) große Investitionen in Zeit und Technik erfordern, um möglichst breit gefächerte Vergleichszahlen zu erhalten.
Eine Möglichkeit ist es, die Ausführung eines Benchmarks zu simulieren anstatt ihn real zu absolvieren, um die Investitionen in Technik und vor allem Zeit zu minimieren. Diese Simulationen haben auch den Vorteil, dass zum Beispiel die Entwickler von Datenbanksystemen die Auswirkungen auf Performance und Energieeffizienz bei der Änderungen an der Architektur simulieren können anstatt sie durch langwierige Regressionstests evaluieren zu müssen. Damit solche Simulationen eine praktische Relevanz erlangen können, muss natürlich die Differenz zwischen den simulierten und den real gewonnenen Vergleichsmetriken möglichst klein sein. Zudem muss eine geeignete Simulation eine möglichst große Anzahl an Datenbanksystemen und technischen Komponenten nachstellen können.
Die vorliegende Dissertation zeigt, dass eine solche Simulation realistisch ist. Dafür wurde in einem ersten Schritt die Einflussaktoren auf Performance, Energieverbrauch und Energieeffizienz eines Datenbanksystems ermittelt und deren Wirkung anhand von experimentellen Ergebnissen bestimmt. Zusätzlich wurden auch geeignete Metriken und generelle Eigenschaften von Datenbanksystemen und von Benchmarks evaluiert. In einem zweiten Schritt wurde dann ein geeignetes Simulationsmodell erarbeitet und sukzessiv weiterentwickelt. Bei jedem Entwicklungsschritt wurden dann reale Experimente in Form von Benchmarkausführungen für verschiedenste Datenbanksysteme und technische Plattformen durchgeführt. Diese Experimente wurden mittels des Simulationsmodells nachvollzogen, um die Differenz zwischen realen und simulierten Benchmarkergebnissen zu berechnen. Die Ergebnisse des letzten Entwicklungsschrittes zeigen, dass diese Differenz unter acht Prozent liegt. Die vorliegende Dissertation zeigt auch, dass das Simulationsmodell nicht nur dazu geeignet ist, anerkannte Benchmarks zu simulieren, sondern sich im allgemeinen auch dafür eignet, ein Datenbanksystem und die technische Plattform, auf der es ausgeführt wird, generell zu simulieren. Das ermöglicht auch die Simulation anderer Anwendungsfälle, zum Beispiel Regressionstests.
The amyloid precursor protein (APP) was discovered in the 1980s as the precursor protein of the amyloid A4 peptide. The amyloid A4 peptide, also known as A-beta (Aβ), is the main constituent of senile plaques implicated in Alzheimer’s disease (AD). In association with the amyloid deposits, increasing impairments in learning and memory as well as the degeneration of neurons especially in the hippocampus formation are hallmarks of the pathogenesis of AD. Within the last decades much effort has been expended into understanding the pathogenesis of AD. However, little is known about the physiological role of APP within the central nervous system (CNS). Allocating APP to the proteome of the highly dynamic presynaptic active zone (PAZ) identified APP as a novel player within this neuronal communication and signaling network. The analysis of the hippocampal PAZ proteome derived from APP-mutant mice demonstrates that APP is tightly embedded in the underlying protein network. Strikingly, APP deletion accounts for major dysregulation within the PAZ proteome network. Ca2+-homeostasis, neurotransmitter release and mitochondrial function are affected and resemble the outcome during the pathogenesis of AD. The observed changes in protein abundance that occur in the absence of APP as well as in AD suggest that APP is a structural and functional regulator within the hippocampal PAZ proteome. Within this review article, we intend to introduce APP as an important player within the hippocampal PAZ proteome and to outline the impact of APP deletion on individual PAZ proteome subcommunities.