Informatik
Refine
Year of publication
Document Type
- Preprint (817)
- Article (459)
- Working Paper (119)
- Doctoral Thesis (93)
- Diploma Thesis (47)
- Conference Proceeding (41)
- Bachelor Thesis (37)
- Book (37)
- diplomthesis (28)
- Report (25)
- Part of a Book (13)
- Contribution to a Periodical (10)
- Master's Thesis (6)
- Habilitation (1)
- Lecture (1)
- Review (1)
Has Fulltext
- yes (1735)
Is part of the Bibliography
- no (1735)
Keywords
Institute
- Informatik (1735)
- Frankfurt Institute for Advanced Studies (FIAS) (1121)
- Physik (1100)
- Mathematik (56)
- Präsidium (41)
- Medizin (25)
- Biowissenschaften (21)
- Exzellenzcluster Makromolekulare Komplexe (8)
- Psychologie (8)
- Deutsches Institut für Internationale Pädagogische Forschung (DIPF) (5)
The synchronous pi-calculus is translated into a core language of Concurrent Haskell extended by futures (CHF). The translation simulates the synchronous message-passing of the pi-calculus by sending messages and adding synchronization using Concurrent Haskell's mutable shared-memory locations (MVars). The semantic criterion is a contextual semantics of the pi-calculus and of CHF using may- and should-convergence as observations. The results are equivalence with respect to the observations, full abstraction of the translation of closed processes, and adequacy of the translation on open processes. The translation transports the semantics of the pi-calculus processes under rather strong criteria, since error-free programs are translated into error-free ones, and programs without non-deterministic error possibilities are also translated into programs without non-deterministic error-possibilities. This investigation shows that CHF embraces the expressive power and the concurrency capabilities of the pi-calculus.
The hepatitis C virus (HCV) RNA replication cycle is a dynamic intracellular process occurring in three-dimensional space (3D), which is difficult both to capture experimentally and to visualize conceptually. HCV-generated replication factories are housed within virus-induced intracellular structures termed membranous webs (MW), which are derived from the Endoplasmatic Reticulum (ER). Recently, we published 3D spatiotemporal resolved diffusion–reaction models of the HCV RNA replication cycle by means of surface partial differential equation (sPDE) descriptions. We distinguished between the basic components of the HCV RNA replication cycle, namely HCV RNA, non-structural viral proteins (NSPs), and a host factor. In particular, we evaluated the sPDE models upon realistic reconstructed intracellular compartments (ER/MW). In this paper, we propose a significant extension of the model based upon two additional parameters: different aggregate states of HCV RNA and NSPs, and population dynamics inspired diffusion and reaction coefficients instead of multilinear ones. The combination of both aspects enables realistic modeling of viral replication at all scales. Specifically, we describe a replication complex state consisting of HCV RNA together with a defined amount of NSPs. As a result of the combination of spatial resolution and different aggregate states, the new model mimics a cis requirement for HCV RNA replication. We used heuristic parameters for our simulations, which were run only on a subsection of the ER. Nevertheless, this was sufficient to allow the fitting of core aspects of virus reproduction, at least qualitatively. Our findings should help stimulate new model approaches and experimental directions for virology.
"Prognosen sind schwierig, besonders, wenn sie die Zukunft betreffen", sagt ein geflügeltes Wort. Die letzte Finanzkrise ist dafür ein gutes Beispiel, denn die wenigsten Analysten und Wirtschaftsweisen haben sie kommen sehen. Da Finanzkrisen glücklicherweise selten sind, ist es allerdings schwierig, Modelle zu entwickeln, die rechtzeitig vor einem Crash warnen.
Die wahrscheinlich beste Entscheidung : wie Online-Algorithmen mit der unsicheren Zukunft rechnen
(2018)
Lohnt es sich, als Skianfänger in einem schneeunsicheren Jahr Skier zu kaufen? Oder ist es günstiger, sie zu mieten? Oft müssen wir Entscheidungen treffen, ohne genügend Informationen über die Zukunft zu haben. Das gilt in noch größerem Maße für Rechnersysteme, die große Datenmengen verarbeiten und schnelle Entscheidungen treffen müssen. Damit sie trotz einer Vielzahl von Unsicherheiten erfolgreich arbeiten können, entwickeln Informatiker OnlineAlgorithmen.
Künstliche Intelligenz (KI), also intelligente Software, führt heutzutage Aufgaben aus, die man einst nur Menschen zutraute. Schon heute ist sie in vielen Bereichen unserer Gesellschaft angekommen – man denke an selbstfahrende Fahrzeuge, medizinische Diagnostik, Übersetzungsprogramme, persönliche Gesprächsassistenten, Suchfunktionen und Robotik. Doch wie weit können wir KI-Systemen vertrauen?
LSTMVoter : chemical named entity recognition using a conglomerate of sequence labeling tools
(2019)
Background: Chemical and biomedical named entity recognition (NER) is an essential preprocessing task in natural language processing. The identification and extraction of named entities from scientific articles is also attracting increasing interest in many scientific disciplines. Locating chemical named entities in the literature is an essential step in chemical text mining pipelines for identifying chemical mentions, their properties, and relations as discussed in the literature. In this work, we describe an approach to the BioCreative V.5 challenge regarding the recognition and classification of chemical named entities. For this purpose, we transform the task of NER into a sequence labeling problem. We present a series of sequence labeling systems that we used, adapted and optimized in our experiments for solving this task. To this end, we experiment with hyperparameter optimization. Finally, we present LSTMVoter, a two-stage application of recurrent neural networks that integrates the optimized sequence labelers from our study into a single ensemble classifier.
Results: We introduce LSTMVoter, a bidirectional long short-term memory (LSTM) tagger that utilizes a conditional random field layer in conjunction with attention-based feature modeling. Our approach explores information about features that is modeled by means of an attention mechanism. LSTMVoter outperforms each extractor integrated by it in a series of experiments. On the BioCreative IV chemical compound and drug name recognition (CHEMDNER) corpus, LSTMVoter achieves an F1-score of 90.04%; on the BioCreative V.5 chemical entity mention in patents corpus, it achieves an F1-score of 89.01%.
Availability and implementation: Data and code are available at https://github.com/texttechnologylab/LSTMVoter.
Mit der Smart Learning Infrastruktur wurde ein neuartiges didaktisches Konzept für Kurse in der Weiterbildung entwickelt. Diese Infrastruktur ist vielfältig anwendbar. Erste Analysen von Kursen zeigen, dass TeilnehmerInnen, die alle Übungen korrekt abgearbeitet haben, eine bessere Note erreichen als die Durchschnittsnote. Dieser Beitrag beschreibt ein Konzept für ein Gamification-Modul, welches mit spielerischen Elementen möglichst frühzeitig dazu animiert, alle Übungen eines Kurses korrekt und mit Verstand abzuarbeiten.
We propose and create a new data model for learning specific environments and learning analytics applications. This is motivated from the experience in the Fiber Bundle Data Model used for large - time and space dependent - data. Our proposed data model integrates file or stream-based data structures from capturing devices more easily. Learning analytics algorithms are added directly to the data, and formulation of queries and analytics is done in Python. It is designed to improve collaboration in the field of learning analytics. We leverage a hierarchical data structure, where varying data is located near the leaves. Abstract data types are identified in four distinct pathways, which allow storing most diverse data sources. We compare different implementations regarding its memory footprint and performance. Our tests indicate that LeAn Bundles can be smaller than a naïve xAPI export. The benchmarks show that the performance is comparable to a MongoDB, while having the benefit of being portable and extensible.
Digitale Kompetenzen von Hochschullehrenden messen : Validierungsstudie eines Kompetenzrasters
(2018)
Der Beitrag beschreibt die Entwicklung eines Kompetenzrasters zur Erfassung digitaler Kompetenzen von Hochschullehrenden und stellt Ergebnisse der Validierung des Rasters vor. Dazu werden die Ergebnisse eines Pre-Tests (N=90) unter Teilnehmenden eines E-LearningQualifizierungsangebots inferenzstatistisch ausgewertet. Zusätzlich werden zur äußeren Validierung des Kompetenzrasters Ergebnisse mit Aussagen der Befragungsteilnehmer*innen verglichen, die mit Hilfe qualitativer Methoden aus E-Portfolios gewonnen wurden. Die skalenanalytischen Befunde erbrachten für sechs der acht Subdimensionen digitaler Kompetenz eindeutige, einfaktorielle Lösungen mit guten Varianzaufklärungen. Die Subskalen verfügen über hohe interne Konsistenzen. Zwei Dimensionen trennen sich faktorenanalytisch in weitere Subtests auf, die sich im Test ebenfalls als reliabel erweisen. Zur Validität des Kompetenzrasters konnten durch Zusammenhänge mit Aussagen aus E-Portfolios positive Belege gesammelt werden.
Wie eine "Heilslehre" überzieht der Begriff "Digitalisierung" fast alle Lebensbereiche – natürlich auch den Bildungsbereich. Gerade wir Informatiker*innen sind gefordert, diese Wege der Bildungstransformation mitzugestalten. Wir zusammen mit den Erziehungswissenschaftler*innen und Psychologen*innen müssen identifizieren, aufzeigen und vorbildlich umsetzen, was sinnvoll und möglich ist. Wir sind diejenigen, die die Bedingungen des Gelingens und auch die der Irrwege erforschen und aufzeigen müssen. Digitalisierungswahnsinn brauchen wir nicht!
Die 16. Jahrestagung DeLFI 2018 der Fachgruppe eLearning der Gesellschaft für Informatik e. V. findet vom 10. bis 13.September 2018 an der Johann Wolfgang Goethe – Universität, Frankfurt am Main statt, gemeinsam mit der 8. Tagung für Hochschuldidaktik der Informatik HDI 2018. ...
The endoplasmic reticulum–mitochondria encounter structure (ERMES) connects the mitochondrial outer membrane with the ER. Multiple functions have been linked to ERMES, including maintenance of mitochondrial morphology, protein assembly and phospholipid homeostasis. Since the mitochondrial distribution and morphology protein Mdm10 is present in both ERMES and the mitochondrial sorting and assembly machinery (SAM), it is unknown how the ERMES functions are connected on a molecular level. Here we report that conserved surface areas on opposite sides of the Mdm10 β-barrel interact with SAM and ERMES, respectively. We generated point mutants to separate protein assembly (SAM) from morphology and phospholipid homeostasis (ERMES). Our study reveals that the β-barrel channel of Mdm10 serves different functions. Mdm10 promotes the biogenesis of α-helical and β-barrel proteins at SAM and functions as integral membrane anchor of ERMES, demonstrating that SAM-mediated protein assembly is distinct from ER-mitochondria contact sites.
In this paper, we study the limit of compactness which is a graph index originally introduced for measuring structural characteristics of hypermedia. Applying compactness to large scale small-world graphs (Mehler, 2008) observed its limit behaviour to be equal 1. The striking question concerning this finding was whether this limit behaviour resulted from the specifics of small-world graphs or was simply an artefact. In this paper, we determine the necessary and sufficient conditions for any sequence of connected graphs resulting in a limit value of CB = 1 which can be generalized with some consideration for the case of disconnected graph classes (Theorem 3). This result can be applied to many well-known classes of connected graphs. Here, we illustrate it by considering four examples. In fact, our proof-theoretical approach allows for quickly obtaining the limit value of compactness for many graph classes sparing computational costs.
Virtual machines are for the most part not used inside of high-energy physics (HEP) environments. Even though they provide a high degree of isolation, the performance overhead they introduce is too great for them to be used. With the rising number of container technologies and their increasing separation capabilities, HEP-environments are evaluating if they could utilize the technology. The container images are small and self-contained which allows them to be easily distributed throughout the global environment. They also offer a near native performance while at the same time aproviding an often acceptable level of isolation. Only the needed services and libraries are packed into an image and executed directly by the host kernel. This work compared the performance impact of the three container technologies Docker, rkt and Singularity. The host kernel was additionally hardened with grsecurity and PaX to strengthen its security and make an exploitation from inside a container harder. The execution time of a physics simulation was used as a benchmark. The results show that the different container technologies have a different impact on the performance. The performance loss on a stock kernel is small; in some cases they were even faster than no container. Docker showed overall the best performance on a stock kernel. The difference on a hardened kernel was bigger than on a stock kernel, but in favor of the container technologies. rkt showed performed in almost all cases better than all the others.
Unter Web-based Trainings (WBTs) versteht man multimediale, interaktive und thematisch abgeschlossene Lerneinheiten in einem Browser. Seit der Entstehung des Internets in den 1990er Jahren sind diese ein wichtiger und etablierter Baustein bei der Konzeption und Entwicklung von eLearning-Szenarien. Diese Lerneinheiten werden üblicherweise von Lehrenden mit entsprechenden Autorensystemen erstellt. In selteneren Fällen handelt es sich bei deren Umsetzungen um individuell programmierte Einzellösungen. Betrachtet man WBTs aus der Sicht der Lernenden, dann lässt sich feststellen, dass zunehmend auch nicht explizit als Lerneinheiten erstellte Inhalte genutzt werden, die jedoch genau den Bedürfnissen des jeweiligen Lernenden entsprechen (im Rahmen des informellen und selbstgesteuerten Lernens). Zum einen liegt das an der zunehmenden Verfügbarkeit und Vielfalt von „alternativen Lerninhalten“ im Internet generell (freie Lizenzen und innovative Autorentools). Zum anderen aber auch an der Möglichkeit, diese Inhalte von überall aus und zu jeder Zeit einfach finden zu können (mobiles Internet, Suchmaschinen und Sprachassistenten) bzw. eingeordnet und empfohlen zu bekommen (Empfehlungssysteme und soziale Medien).
Aus dieser Veränderung heraus ergibt sich im Rahmen dieser Dissertation die zentrale Fragestellung, ob das Konzept eines dedizierten WBT-Autorensystems den neuen Anforderungen von frei verfügbaren, interaktiven Lerninhalten (Khan Academy, YouTube und Wikipedia) und einer Vielzahl ständig wachsender und kostenfreier Autorentools für beliebige Web-Inhalte (H5P, PowToon oder Pageflow) überhaupt noch gerecht wird und wo in diesem Fall genau die Alleinstellungsmerkmale eines WBTs liegen?
Zur Beantwortung dieser Frage beschäftigt sich die Arbeit grundlegend mit dem Begriff „Web-based Training“, den über die Zeit geänderten Rahmenbedingungen und den daraus resultierenden Implikationen für die Entwicklung von WBT-Autorensystemen. Mittels des gewählten Design-based Research (DBR)-Ansatzes konnte durch kontinuierliche Zyklen von Gestaltung, Durchführung, Analyse und Re-Design am Beispiel mehrerer eLearning-Projekte der Begriff WBT neudefininiert bzw. reinterpretiert werden, so dass sich der Fokus der Definition auf das konzentriert, was WBTs im Vergleich zu anderen Inhalten und Funktionen im Internet im Kern unterscheidet: dem Lehr-/Lernaspekt (nachfolgend Web-based Training 2.0 (WBT 2.0)).
Basierend auf dieser Neudefinition konnten vier Kernfunktionalitäten ausgearbeitet werden, die die zuvor genannten Herausforderungen adressieren und in Form eines Design Frameworks detailliert beschreiben. Untersucht und entwickelt wurden die unterschiedlichen Aspekte und Funktionen der WBTs 2.0 anhand der iterativen „Meso-Zyklen“ des DBR-Ansatzes, wobei jedes der darin durchgeführten Projekte auch eigene Ergebnisse mit sich bringt, welche jeweils unter didaktischen und vor allem aber technischen Gesichtspunkten erörtert wurden. Die dadurch gewonnenen Erkenntnisse flossen jeweils in den Entwicklungsprozess der LernBar ein („Makro-Zyklus“), ein im Rahmen dieser Arbeit und von studiumdigitale, der zentralen eLearning-Einrichtung der Goethe-Universität, entwickeltes WBT-Autorensystem. Dabei wurden die Entwicklungen kontinuierlich unter Einbezug von Nutzerfeedbacks (jährliche Anwendertreffen, Schulungen, Befragungen, Support) überprüft und weiterentwickelt.
Abschließend endet der letzte Entwicklungszyklus des DBR-Ansatzes mit der Konzeption und Umsetzung von drei WBT 2.0-Systemkomponenten, wodurch sich flexibel beliebige Web-Inhalte mit entsprechenden WBT 2.0-Funktionalitäten erweitern lassen, um auch im Kontext von offenen Lehr-/Lernprozessen durchgeführte Aktivitäten transparent, nachvollziehbar und somit überprüfbar zu machen (Constructive Alignment).
Somit bietet diese Forschungsarbeit einen interdisziplinären, nutzerzentrierten und in der Praxis erprobten Ansatz für die Umsetzung und den Einsatz von WBTs im Kontext offener Lehr-/Lernprozesse. Dabei verschiebt sich der bisherige Fokus von der reinen Medienproduktion hin zu einem ganzheitlichen Ansatz, bei dem der Lehr-/Lernaspekt im Vordergrund steht (Lernbedarf erkennen, decken und überprüfen). Entscheidend ist dabei, dass zum Decken eines Lernbedarfs sämtliche zur Verfügung stehenden Ressourcen des Internets genutzt werden können, wobei WBTs 2.0 dazu lediglich den didaktischen Prozess definieren und diesen für die Lehrenden und Lernende transparent und zugänglich machen.
WBTs 2.0 profitieren dadurch zukünftig von der zunehmenden Vielfalt und Verfügbarkeit von Inhalten und Funktionen im Internet und ermöglichen es, den Entwicklern von WBT 2.0-Autorensystemen sich auf das Wesentliche zu konzentrieren: den Lehr-/Lernprozess.
Die vorliegende Arbeit lässt sich in den Bereich Data Science einordnen. Data Science verwendet Verfahren aus dem Bereich Computer Science, Algorithmen aus der Mathematik und Statistik sowie Domänenwissen, um große Datenmengen zu analysieren und neue Erkenntnisse zu gewinnen. In dieser Arbeit werden verschiedene Forschungsbereiche aus diesen verwendet. Diese umfassen die Datenanalyse im Bereich von Big Data (soziale Netzwerke, Kurznachrichten von Twitter), Opinion Mining (Analyse von Meinungen auf Basis eines Lexikons mit meinungstragenden Phrasen) sowie Topic Detection (Themenerkennung)....
Ergebnis 1: Sentiment Phrase List (SePL)
Im Forschungsbereich Opinion Mining spielen Listen meinungstragender Wörter eine wesentliche Rolle bei der Analyse von Meinungsäußerungen. Das im Rahmen dieser Arbeit entwickelte Vorgehen zur automatisierten Generierung einer solchen Liste leistet einen wichtigen Forschungsbeitrag in diesem Gebiet. Der neuartige Ansatz ermöglicht es einerseits, dass auch Phrasen aus mehreren Wörtern (inkl. Negationen, Verstärkungs- und Abschwächungspartikeln) sowie Redewendungen enthalten sind, andererseits werden die Meinungswerte aller Phrasen auf Basis eines entsprechenden Korpus automatisiert berechnet. Die Sentiment Phrase List sowie das Vorgehen wurden veröffentlicht und können von der Forschungsgemeinde genutzt werden [121, 123]. Die Erstellung basiert auf einer textuellen sowie zusätzlich numerischen Bewertung, welche typischerweise in Kundenrezensionen verwendet werden (beispielsweise der Titel und die Sternebewertung bei Amazon Kundenrezensionen). Es können weitere Datenquellen verwendet werden, die eine derartige Bewertung aufweisen. Auf Basis von ca. 1,5 Millionen deutschen Kundenrezensionen wurden verschiedene Versionen der SePL erstellt und veröffentlicht [120].
Ergebnis 2: Algorithmus auf Basis der SePL
Mit Hilfe der SePL und den darin enthaltenen meinungstragenden Phrasen ergeben sich Verbesserungen für lexikonbasierte Verfahren bei der Analyse von Meinungsäußerungen. Phrasen werden im Text häufig durch andere Wörter getrennt, wodurch eine Identifizierung der Phrasen erforderlich ist. Der Algorithmus für eine lexikonbasierte Meinungsanalyse wurde veröffentlicht [176]. Er basiert auf meinungstragenden Phrasen bestehend aus einem oder mehreren Wörtern. Da für einzelne Phrasen unterschiedliche Meinungswerte vorliegen, ist eine genauere Bewertung als mit bisherigen Ansätzen möglich. Dies ermöglicht, dass meinungstragende Phrasen aus dem Text extrahiert und anhand der in der SePL enthaltenen Einträge differenziert bewertet werden können. Bisherige Ansätze nutzen häufig einzelne meinungstragende Wörter. Der Meinungswert für beispielsweise eine Verneinung muss nicht anhand eines generellen Vorgehens erfolgen. In aktuellen Verfahren wird der Wert eines meinungstragenden Wortes bei Vorhandensein einer Verneinung bisher meist invertiert, was häufig falsche Ergebnisse liefert. Die Liste enthält im besten Fall sowohl einen Meinungswert für das einzelne Wort und seine Verneinung (z.B. „schön“ und „nicht schön“).
1.3 übersicht der hauptergebnisse 5
Ergebnis 3: Evaluierung der Anwendung der SePL
Der Algorithmus aus Ergebnis 2 wurde mit Rezensionen der Bewertungsplattform CiaoausdemBereichderAutomobilversicherunge valuiert.Dabei wurden wesentliche Fehlerquellen aufgezeigt [176], die entsprechende Verbesserungen ermöglichen. Weiterhin wurde mit der SePL eine Evaluation anhand eines Maschinenlernverfahrens auf Basis einer Support Vector Machine durchgeführt. Hierbei wurden verschiedene bestehende lexikalische Ressourcen mit der SePL verglichen sowie deren Einsatz in verschiedenen Domänen untersucht. Die Ergebnisse wurden in [115] veröffentlicht.
Ergebnis 4: Forschungsprojekt PoliTwi - Themenerkennung politischer Top-Themen
Mit dem Forschungsprojekt PoliTwi wurden einerseits die erforderlichen Daten von Twitter gesammelt. Andererseits werden der breiten Öffentlichkeit fortlaufend aktuelle politische Top-Themen über verschiedene Kanäle zur Verfügung gestellt. Für die Evaluation der angestrebten Verbesserungen im Bereich der Themenerkennung in Verbindung mit einer Meinungsanalyse liegen die erforderlichen Daten über einen Zeitraum von bisher drei Jahren aus der Domäne Politik vor. Auf Basis dieser Daten konnte die Themenerkennung durchgeführt werden. Die berechneten Themen wurden mit anderen Systemen wie Google Trends oder Tagesschau Meta verglichen (siehe Kapitel 5.3). Es konnte gezeigt werden, dass die Meinungsanalyse die Themenerkennung verbessern kann. Die Ergebnisse des Projekts wurden in [124] veröffentlicht. Der Öffentlichkeit und insbesondere Journalisten und Politikern wird zudem ein Service (u.a. anhand des Twitter-Kanals unter https://twitter.com/politwi) zur Verfügung gestellt, anhand dessen sie über aktuelle Top-Themen informiert werden. Nachrichtenportale wie FOCUS Online nutzten diesen Service bei ihrer Berichterstattung (siehe Kapitel 4.3.6.1). Die Top-Themen werden seit Mitte 2013 ermittelt und können zudem auf der Projektwebseite [119] abgerufen werden.
Ergebnis 5: Erweiterung lexikalischer Ressourcen auf Konzeptebene
Das noch junge Forschungsgebiet des Concept-level Sentiment Analysis versucht bisherige Ansätze der Meinungsanalyse dadurch zu verbessern, dass Meinungsäußerungen auf Konzeptebene analysiert werden. Eine Voraussetzung sind Listen meinungstragender Wörter, welche differenzierte Betrachtungen anhand unterschiedlicher Kontexte ermöglichen. Anhand der Top-Themen und deren Kontext wurde ein Vorgehen entwickelt, welches die Erstellung bzw. Ergänzung dieser Listen ermöglicht. Es wurde gezeigt, wie Meinungen in unterschiedlichen Kontexten differenziert bewertet werden und diese Information in lexikalischen Ressourcen aufgenommen werden können, was im Bereich der Concept-level Sentiment Analysis genutzt werden kann. Das Vorgehen wurde in [124] veröffentlicht.
Students of computer science studies enter university education with very different competencies, experience and knowledge. 145 datasets collected of freshmen computer science students by learning management systems in relation to exam outcomes and learning dispositions data (e. g. student dispositions, previous experiences and attitudes measured through self-reported surveys) has been exploited to identify indicators as predictors of academic success and hence make effective interventions to deal with an extremely heterogeneous group of students.
Die 8. Fachtagung für Hochschuldidaktik der Informatik (HDI) fand im September 2018 zusammen mit der Deutschen E-Learning Fachtagung Informatik (DeLFI) unter dem gemeinsamen Motto "Digitalisierungswahnsinn? – Wege der Bildungstransformationen" in Frankfurt statt.
Dabei widmet sich die HDI allen Fragen der informatischen Bildung im Hochschulbereich. Schwerpunkte bildeten in diesem Jahr u. a.:
– Analyse der Inhalte und anzustrebenden Kompetenzen in Informatikveranstaltungen
– Programmieren lernen & Einstieg in Softwareentwicklung
– Spezialthemen: Data Science, Theoretische Informatik und Wissenschaftliches Arbeiten
Die Fachtagung widmet sich ausgewählten Fragestellungen dieser Themenkomplexe, die durch Vorträge ausgewiesener Experten und durch eingereichte Beiträge intensiv behandelt werden.
Background: Although mortality after cardiac surgery has significantly decreased in the last decade, patients still experience clinically relevant postoperative complications. Among others, atrial fibrillation (AF) is a common consequence of cardiac surgery, which is associated with prolonged hospitalization and increased mortality.
Methods: We retrospectively analyzed data from patients who underwent coronary artery bypass grafting, valve surgery or a combination of both at the University Hospital Muenster between April 2014 and July 2015. We evaluated the incidence of new onset and intermittent/permanent AF (patients with pre- and postoperative AF). Furthermore, we investigated the impact of postoperative AF on clinical outcomes and evaluated potential risk factors.
Results: In total, 999 patients were included in the analysis. New onset AF occurred in 24.9% of the patients and the incidence of intermittent/permanent AF was 59.5%. Both types of postoperative AF were associated with prolonged ICU length of stay (median increase approx. 2 days) and duration of mechanical ventilation (median increase 1 h). Additionally, new onset AF patients had a higher rate of dialysis and hospital mortality and more positive fluid balance on the day of surgery and postoperative days 1 and 2. In a multiple logistic regression model, advanced age (odds ratio (OR) = 1.448 per decade increase, p < 0.0001), a combination of CABG and valve surgery (OR = 1.711, p = 0.047), higher C-reactive protein (OR = 1.06 per unit increase, p < 0.0001) and creatinine plasma concentration (OR = 1.287 per unit increase, p = 0.032) significantly predicted new onset AF. Higher Horowitz index values were associated with a reduced risk (OR = 0.996 per unit increase, p = 0.012). In a separate model, higher plasma creatinine concentration (OR = 2.125 per unit increase, p = 0.022) was a significant risk factor for intermittent/permanent AF whereas higher plasma phosphate concentration (OR = 0.522 per unit increase, p = 0.003) indicated reduced occurrence of this arrhythmia.
Conclusions: New onset and intermittent/permanent AF are associated with adverse clinical outcomes of elective cardiac surgery patients. Different risk factors implicated in postoperative AF suggest different mechanisms might be involved in its pathogenesis. Customized clinical management protocols seem to be warranted for a higher success rate of prevention and treatment of postoperative AF.
The transverse momentum distributions of the strange and double-strange hyperon resonances (Σ(1385)±,Ξ(1530)0) produced in p–Pb collisions at √sNN = 5.02 TeV were measured in the rapidity range −0.5<yCMS<0 for event classes corresponding to different charged-particle multiplicity densities, ⟨dNch/dηlab⟩. The mean transverse momentum values are presented as a function of ⟨dNch/dηlab⟩, as well as a function of the particle masses and compared with previous results on hyperon production. The integrated yield ratios of excited to ground-state hyperons are constant as a function of ⟨dNch/dηlab⟩. The equivalent ratios to pions exhibit an increase with ⟨dNch/dηlab⟩, depending on their strangeness content.
Motivation: Arabidopsis thaliana is a well-established model system for the analysis of the basic physiological and metabolic pathways of plants. Nevertheless, the system is not yet fully understood, although many mechanisms are described, and information for many processes exists. However, the combination and interpretation of the large amount of biological data remain a big challenge, not only because data sets for metabolic paths are still incomplete. Moreover, they are often inconsistent, because they are coming from different experiments of various scales, regarding, for example, accuracy and/or significance. Here, theoretical modeling is powerful to formulate hypotheses for pathways and the dynamics of the metabolism, even if the biological data are incomplete. To develop reliable mathematical models they have to be proven for consistency. This is still a challenging task because many verification techniques fail already for middle-sized models. Consequently, new methods, like decomposition methods or reduction approaches, are developed to circumvent this problem.
Methods: We present a new semi-quantitative mathematical model of the metabolism of Arabidopsis thaliana. We used the Petri net formalism to express the complex reaction system in a mathematically unique manner. To verify the model for correctness and consistency we applied concepts of network decomposition and network reduction such as transition invariants, common transition pairs, and invariant transition pairs.
Results: We formulated the core metabolism of Arabidopsis thaliana based on recent knowledge from literature, including the Calvin cycle, glycolysis and citric acid cycle, glyoxylate cycle, urea cycle, sucrose synthesis, and the starch metabolism. By applying network decomposition and reduction techniques at steady-state conditions, we suggest a straightforward mathematical modeling process. We demonstrate that potential steady-state pathways exist, which provide the fixed carbon to nearly all parts of the network, especially to the citric acid cycle. There is a close cooperation of important metabolic pathways, e.g., the de novo synthesis of uridine-5-monophosphate, the γ-aminobutyric acid shunt, and the urea cycle. The presented approach extends the established methods for a feasible interpretation of biological network models, in particular of large and complex models.
This volume contains the papers presented at the First International Workshop on Rewriting Techniques for Program Transformations and Evaluation (WPTE 2014) which was held on July 13, 2014 in Vienna, Austria during the Vienna Summer of Logic 2014 (VSL 2014) as a workshop of the Sixth Federated Logic Conference (FLoC 2014). WPTE 2014 was affiliated with the 25th International Conference on Rewriting Techniques and Applications joined with the 12th International Conference on Typed Lambda Calculi and Applications (RTA/TLCA 2014).
Background: Signal transduction pathways are important cellular processes to maintain the cell’s integrity. Their imbalance can cause severe pathologies. As signal transduction pathways feature complex regulations, they form intertwined networks. Mathematical models aim to capture their regulatory logic and allow an unbiased analysis of robustness and vulnerability of the signaling network. Pathway detection is yet a challenge for the analysis of signaling networks in the field of systems biology. A rigorous mathematical formalism is lacking to identify all possible signal flows in a network model.
Results: In this paper, we introduce the concept of Manatee invariants for the analysis of signal transduction networks. We present an algorithm for the characterization of the combinatorial diversity of signal flows, e.g., from signal reception to cellular response. We demonstrate the concept for a small model of the TNFR1-mediated NF- κB signaling pathway. Manatee invariants reveal all possible signal flows in the network. Further, we show the application of Manatee invariants for in silico knockout experiments. Here, we illustrate the biological relevance of the concept.
Conclusions: The proposed mathematical framework reveals the entire variety of signal flows in models of signaling systems, including cyclic regulations. Thereby, Manatee invariants allow for the analysis of robustness and vulnerability of signaling networks. The application to further analyses such as for in silico knockout was shown. The new framework of Manatee invariants contributes to an advanced examination of signaling systems.
Exploring biophysical properties of virus-encoded components and their requirement for virus replication is an exciting new area of interdisciplinary virological research. To date, spatial resolution has only rarely been analyzed in computational/biophysical descriptions of virus replication dynamics. However, it is widely acknowledged that intracellular spatial dependence is a crucial component of virus life cycles. The hepatitis C virus-encoded NS5A protein is an endoplasmatic reticulum (ER)-anchored viral protein and an essential component of the virus replication machinery. Therefore, we simulate NS5A dynamics on realistic reconstructed, curved ER surfaces by means of surface partial differential equations (sPDE) upon unstructured grids. We match the in silico NS5A diffusion constant such that the NS5A sPDE simulation data reproduce experimental NS5A fluorescence recovery after photobleaching (FRAP) time series data. This parameter estimation yields the NS5A diffusion constant. Such parameters are needed for spatial models of HCV dynamics, which we are developing in parallel but remain qualitative at this stage. Thus, our present study likely provides the first quantitative biophysical description of the movement of a viral component. Our spatio-temporal resolved ansatz paves new ways for understanding intricate spatial-defined processes central to specfic aspects of virus life cycles.
Automated deduction in higher-order program calculi, where properties of transformation rules are demanded, or confluence or other equational properties are requested, can often be done by syntactically computing overlaps (critical pairs) of reduction rules and transformation rules. Since higher-order calculi have alpha-equivalence as fundamental equivalence, the reasoning procedure must deal with it. We define ASD1-unification problems, which are higher-order equational unification problems employing variables for atoms, expressions and contexts, with additional distinct-variable constraints, and which have to be solved w.r.t. alpha-equivalence. Our proposal is to extend nominal unification to solve these unification problems. We succeeded in constructing the nominal unification algorithm NomUnifyASC. We show that NomUnifyASC is sound and complete for these problem class, and outputs a set of unifiers with constraints in nondeterministic polynomial time if the final constraints are satisfiable. We also show that solvability of the output constraints can be decided in NEXPTIME, and for a fixed number of context-variables in NP time. For terms without context-variables and atom-variables, NomUnifyASC runs in polynomial time, is unitary, and extends the classical problem by permitting distinct-variable constraints.
1998 ACM Subject Classification F.4.1 Mathematical Logic
We propose a model for measuring the runtime of concurrent programs by the minimal number of evaluation steps. The focus of this paper are improvements, which are program transformations that improve this number in every context, where we distinguish between sequential and parallel improvements, for one or more processors, respectively. We apply the methods to CHF, a model of Concurrent Haskell extended by futures. The language CHF is a typed higher-order functional language with concurrent threads, monadic IO and MVars as synchronizing variables. We show that all deterministic reduction rules and 15 further program transformations are sequential and parallel improvements. We also show that introduction of deterministic parallelism is a parallel improvement, and its inverse a sequential improvement, provided it is applicable. This is a step towards more automated precomputation of concurrent programs during compile time, which is also formally proven to be correctly optimizing.
Data structures and advanced models of computation on big data : report from Dagstuhl seminar 14091
(2014)
This report documents the program and the outcomes of Dagstuhl Seminar 14091 "Data Structures and Advanced Models of Computation on Big Data". In today's computing environment vast amounts of data are processed, exchanged and analyzed. The manner in which information is stored profoundly influences the efficiency of these operations over the data. In spite of the maturity of the field many data structuring problems are still open, while new ones arise due to technological advances.
The seminar covered both recent advances in the "classical" data structuring topics as well as new models of computation adapted to modern architectures, scientific studies that reveal the need for such models, applications where large data sets play a central role, modern computing platforms for very large data, and new data structures for large data in modern architectures.
The extended abstracts included in this report contain both recent state of the art advances and lay the foundation for new directions within data structures research.
Synaptic release sites are characterized by exocytosis-competent synaptic vesicles tightly anchored to the presynaptic active zone (PAZ) whose proteome orchestrates the fast signaling events involved in synaptic vesicle cycle and plasticity. Allocation of the amyloid precursor protein (APP) to the PAZ proteome implicated a functional impact of APP in neuronal communication. In this study, we combined state-of-the-art proteomics, electrophysiology and bioinformatics to address protein abundance and functional changes at the native hippocampal PAZ in young and old APP-KO mice. We evaluated if APP deletion has an impact on the metabolic activity of presynaptic mitochondria. Furthermore, we quantified differences in the phosphorylation status after long-term-potentiation (LTP) induction at the purified native PAZ. We observed an increase in the phosphorylation of the signaling enzyme calmodulin-dependent kinase II (CaMKII) only in old APP-KO mice. During aging APP deletion is accompanied by a severe decrease in metabolic activity and hyperphosphorylation of CaMKII. This attributes an essential functional role to APP at hippocampal PAZ and putative molecular mechanisms underlying the age-dependent impairments in learning and memory in APP-KO mice.
Random graph models, originally conceived to study the structure of networks and the emergence of their properties, have become an indispensable tool for experimental algorithmics. Amongst them, hyperbolic random graphs form a well-accepted family, yielding realistic complex networks while being both mathematically and algorithmically tractable. We introduce two generators MemGen and HyperGen for the G_{alpha,C}(n) model, which distributes n random points within a hyperbolic plane and produces m=n*d/2 undirected edges for all point pairs close by; the expected average degree d and exponent 2*alpha+1 of the power-law degree distribution are controlled by alpha>1/2 and C. Both algorithms emit a stream of edges which they do not have to store. MemGen keeps O(n) items in internal memory and has a time complexity of O(n*log(log n) + m), which is optimal for networks with an average degree of d=Omega(log(log n)). For realistic values of d=o(n / log^{1/alpha}(n)), HyperGen reduces the memory footprint to O([n^{1-alpha}*d^alpha + log(n)]*log(n)). In an experimental evaluation, we compare HyperGen with four generators among which it is consistently the fastest. For small d=10 we measure a speed-up of 4.0 compared to the fastest publicly available generator increasing to 29.6 for d=1000. On commodity hardware, HyperGen produces 3.7e8 edges per second for graphs with 1e6 < m < 1e12 and alpha=1, utilising less than 600MB of RAM. We demonstrate nearly linear scalability on an Intel Xeon Phi.
Interest to become a data scientist or related professions in data science domain is rapidly growing. To meet such a demand, we propose a novel educational service that aims to provide tailored learning paths for data science. Our target user is one who aims to be an expert in data science. Our approach is to analyze the background of the practitioner and match the learning units. A critical feature is that we use gamification to reinforce the practitioner engagement. We believe that our work provides a practical guideline for those who want to learn data science.
Exhaustive, automatic testing of dataflow (esp. mapreduce) programs has emerged as an important challenge. Past work demonstrated effective ways to generate small example data sets that exercise operators in the Pig platform, used to generate Hadoop map-reduce programs. Although such prior techniques attempt to cover all cases of operator use, in practice they often fail. Our SEDGE system addresses these completeness problems: for every dataflow operator, we produce data aiming to cover all cases that arise in the dataflow program (e.g., both passing and failing a filter). SEDGE relies on transforming the program into symbolic constraints, and solving the constraints using a symbolic reasoning engine (a powerful SMT solver), while using input data as concrete aids in the solution process. The approach resembles dynamic-symbolic (a.k.a. "concolic") execution in a conventional programming language, adapted to the unique features of the dataflow domain.
In third-party benchmarks, SEDGE achieves higher coverage than past techniques for 5 out of 20 PigMix benchmarks and 7 out of 11 SDSS benchmarks and (with equal coverage for the rest of the benchmarks). We also show that our targeting of the high-level dataflow language pays off: for complex programs, state-of-the-art dynamic-symbolic execution at the level of the generated map-reduce code (instead of the original dataflow program) requires many more test cases or achieves much lower coverage than our approach.
The presented work inside this thesis aims to raise the degree of automation in analog circuit design. Therefore, a framework was developed to provide the necessary mechanisms in order to carry out a fully automated analog circuit synthesis, i.e., the construction of an analog circuit fulfilling all previously defined (electrical) specifications. Nowadays, analog circuit design in general is a very time consuming process compared to a digital design flow. Due to its discrete nature, the digital design process is highly automated and thus very efficient compared to analog circuit design. In modern Very-Large-Scale integration (VLSI) circuits the analog parts are mostly just a small portion of the overall chip area. Although this small portion is known to consume a major part of the needed workforce. Paired with product cycles which constantly get shorter, the time needed to develop the analog parts of an integrated circuit (IC) becomes a determinant factor. Apart from this, the ongoing progress in semiconductor processing technologies promises more speed with less power consumption on smaller areas, forcing the IC developers to keep track with the technology nodes in order to maintain competitiveness. Analog circuitry exhibits the inherent property of being hard to reuse, as porting from one technology node to another imposes critical changes for operating conditions (e.g., supply voltage) - mostly leading to a full redesign for most of the analog modules. This productivity gap between digital and analog design resembles the primary motivation for this thesis. Due to the availability of commercial sizing tools, this work deliberately focuses on the construction of circuit topologies in distinction to parameter synthesis, which can be obtained with a dedicated sizing tool. The focus on circuit construction allows the development of a framework which allows a full design space exploration. This thesis describes the needed concepts and methods to realize a deterministic, explorative analog synthesis framework. Despite this, a reference implementation is presented, which demonstrates the applicability in current analog design flows.
This paper describes the ongoing efforts of the authors to present ancient Greek and Roman numismatic data on the public internet, with an emphasis on efforts to integrate information from multiple sources using Linked Data and Semantic Web techniques. By way of very modern metaphor, it is useful to think of coins as intentionally created packages of 'named entities'. Each coin was struck by a particular authority, often at a known site, and coins often make reference to familiar concepts such as deities, historical events, or symbols that were widely recognized in the ancient world. The institutions represented among the authors have deployed search interfaces that allow users to take advantage of this aspect of numismatic databases. The American Numismatic Society's database provides faceted search to its collection of over 550,000 objects. The Portable Antiquities Scheme (PAS) in the UK presents individual finds (and hoards) recorded throughout the country. The Römisch-Germanische Kommission and the University of Frankfurt (DBIS) are developing a prototype metaportal (INTERFACE) that accesses national databases of coin finds held in in Frankfurt, Vienna and Utrecht. Each of these resources is beginning to explore Semantic Web/Linked data approaches so that the role of numismatic standards is immediately coming to the fore. DBIS and INTERFACE are developing a numismatic ontology. At the ANS and PAS, the public database already presents RDF serializations based on Dublin Core. Together, the authors have begun to explore standardization of conceptual names on the basis of the vocabulary presented at the site http://nomisma.org . Nomisma.org is a collaborative effort to provide stable digital representations of numismatic concepts and entities. It provides URIs for such basic concepts as 'coin', 'mint', 'axis'. All of these are defined within the scope of numismatics but are already being linked to other stable resources where available. This is particularly the case for mints. For example, the URI http://nomisma.org/id/corinth is intended to represent that ancient city in its role as a minter/issuer of coins. The URI is linked via the SKOS ontology to the Pleiades Gazetteer of ancient places. This allows Nomisma to be the basis for a common representation of the concept that an object is a coin minted at Corinth. The ANS has already deployed such relationships in its public database. The work of all these projects is very much in progress so that this paper hopes to generate discussion on how multiple large projects can move forward in their own work while encouraging sufficient commonality to support large scale research questions undertaken by diverse audiences.
Spam detection in wikis
(2012)
Wikis haben durch ihre kollaborativen Eigenschaften maßgeblich an der Entstehung des Web 2.0 beigetragen: Durch die Zusammenarbeit vieler Benutzer ist es möglich geworden, große Mengen an Daten aufzubereiten und strukturiert zusammenzustellen. So ist ein Datenschatz angewachsen, der wertvoll für die maschinelle Verarbeitung von Text ist: Mittels der Techniken des TextMining lassen sich aus Wikis viele Informationen extrahieren. Dazu ist es zunächst sinnvoll, deren Inhalte herunterzuladen und lokal zu speichern.
Zum Editieren von Seiten existieren häufig keine Zugangsbeschränkungen. So wird die genannte Akkumulation von Informationen ermöglicht, da sich viele Benutzer beteiligen können. Jedoch birgt dies die Gefahr, dass Wikis durch Spam verunreinigt werden: Zur Verwendung als Wissensbasis ist dies hinderlich.
Gängige Anti-Spam-Maßnahmen finden online statt und setzen unter anderem auf die Überwachung durch die Nutzer oder den Einsatz von Blacklists für Weblinks. Im Gegensatz dazu wird im Rahmen dieser Arbeit folgender Ansatz gewählt: Ein lokal gespeichertes Wiki wird einer Bestandsaufnahme unterzogen und in seiner Gesamtheit untersucht. Es werden ausschließlich die Inhalte der Seiten berücksichtigt. Die Spam-Erkennung beruht auf einer Kombination von Entscheidungsregeln sowie der Berücksichtigung von Wortwahrscheinlichkeiten. Dadurch konnten gute Ergebnisse erzielt werden.
We study Gaifman locality and Hanf locality of an extension of first-order logic with modulo p counting quantifiers (FO+MODp , for short) with arbitrary numerical predicates. We require that the validity of formulas is independent of the particular interpretation of the numerical predicates and refer to such formulas as arb-invariant formulas. This paper gives a detailed picture of locality and non-locality properties of arb-invariant FO+MODp . For example, on the class of all finite structures, for any p 2, arb-invariant FO+MODp is neither Hanf nor Gaifman local with respect to a sublinear locality radius. However, in case that p is an odd prime power, it is weakly Gaifman local with a polylogarithmic locality radius. And when restricting attention to the class of string structures, for odd prime powers p, arb-invariant FO+MODp is both Hanf and Gaifman local with a polylogarithmic locality radius. Our negative results build on examples of order-invariant FO+MODp formulas presented in Niemist ̈o’s PhD thesis. Our positive results make use of the close connection between FO+MODp and Boolean circuits built from NOT-gates and AND-, OR-, and MOD p - gates of arbitrary fan-in.
We consider the isolated spelling error correction problem as a specific subproblem of the more general string-to-string translation problem. In this context, we investigate four general string-to-string transformation models that have been suggested in recent years and apply them within the spelling error correction paradigm. In particular, we investigate how a simple ‘k-best decoding plus dictionary lookup’ strategy performs in this context and find that such an approach can significantly outdo baselines such as edit distance, weighted edit distance, and the noisy channel Brill and Moore model to spelling error correction. We also consider elementary combination techniques for our models such as language model weighted majority voting and center string combination. Finally, we consider real-world OCR post-correction for a dataset sampled from medieval Latin texts.
Research in the field of Digital Humanities, also known as Humanities Computing, has seen a steady increase over the past years. Situated at the intersection of computing science and the humanities, present efforts focus on making resources such as texts, images, musical pieces and other semiotic artifacts digitally available, searchable and analysable. To this end, computational tools enabling textual search, visual analytics, data mining, statistics and natural language processing are harnessed to support the humanities researcher. The processing of large data sets with appropriate software opens up novel and fruitful approaches to questions in the traditional humanities. This report summarizes the Dagstuhl seminar 14301 on “Computational Humanities - bridging the gap between Computer Science and Digital Humanities”.
1998 ACM Subject Classification I.2.7 Natural Language Processing, J.5 Arts and Humanities
In order to promote the accessibility of biodiversity data in historic and contemporary literature, we introduce a new interdisciplinary project called BIOfid (FID=Fachinformationsdienst, a service for providing specialized information). The project aims at a mobilization of data available in print only by combining digitization of scientific biodiversity literature with the development of innovative text mining tools for complex, eventually semantic searches throughout the complete text corpus. A major prerequisite for the development of such search tools is the provision of sophisticated anatomy ontologies on the one hand, and of complete lists of species names (currently considered valid as well as all synonyms) at a global scale on the other hand. In the initial stage, we chose examples from German publications of the past 250 years dealing with the geographic distribution and ecology of vascular plants (Tracheophyta), birds (Aves), as well as moths and butterflies (Lepidoptera) in Germany. These taxa have been prioritized according to current demands of German research groups (about 50 sites) aiming at analyses and modeling of distribution patterns and their changes through time. In the long term, we aim at providing data and open source software applicable for any taxon and geographic region. For this purpose, a platform for open access journals for long-term availability of professional e-journals will be established. All generated data will also be made accessible through GFBio (German Federation for Biological Data). BIOfid is supported by the LIS-Scientific Library Services and Information Systems program of the German Research Foundation (DFG).
We present results on transverse momentum (pT) and rapidity (y) differential production cross sections, mean transverse momentum and mean transverse momentum square of inclusive J/ψ and ψ(2S) at forward rapidity (2.5 < y < 4) as well as ψ(2S)-to-J/ψ cross section ratios. These quantities are measured in pp collisions at center of mass energies s√=5.02 and 13 TeV with the ALICE detector. Both charmonium states are reconstructed in the dimuon decay channel, using the muon spectrometer. A comprehensive comparison to inclusive charmonium cross sections measured at s√=2.76, 7 and 8 TeV is performed. A comparison to non-relativistic quantum chromodynamics and fixed-order next-to-leading logarithm calculations, which describe prompt and non-prompt charmonium production respectively, is also presented. A good description of the data is obtained over the full pT range, provided that both contributions are summed. In particular, it is found that for pT > 15 GeV/c the non-prompt contribution reaches up to 50% of the total charmonium yield.
We explore space improvements in LRP, a polymorphically typed call-by-need functional core language. A relaxed space measure is chosen for the maximal size usage during an evaluation. It Abstracts from the details of the implementation via abstract machines, but it takes garbage collection into account and thus can be seen as a realistic approximation of space usage. The results are: a context lemma for space improving translations and for space equivalences; all but one reduction rule of the calculus are shown to be space improvements, and the exceptional one, the copy-rule, is shown to increase space only moderately.
Several further program transformations are shown to be space improvements or space equivalences, in particular the translation into machine expressions is a space equivalence. These results are a step Forward in making predictions about the change in runtime space behavior of optimizing transformations in callbyneed functional languages.
Motivated by tools for automaed deduction on functional programming languages and programs, we propose a formalism to symbolically represent $\alpha$-renamings for meta-expressions. The formalism is an extension of usual higher-order meta-syntax which allows to $\alpha$-rename all valid ground instances of a meta-expression to fulfill the distinct variable convention. The renaming mechanism may be helpful for several reasoning tasks in deduction systems. We present our approach for a meta-language which uses higher-order abstract syntax and a meta-notation for recursive let-bindings, contexts, and environments. It is used in the LRSX Tool -- a tool to reason on the correctness of program transformations in higher-order program calculi with respect to their operational semantics. Besides introducing a formalism to represent symbolic $\alpha$-renamings, we present and analyze algorithms for simplification of $\alpha$-renamings, matching, rewriting, and checking $\alpha$-equivalence of symbolically $\alpha$-renamed meta-expressions.
We introduce rewriting of meta-expressions which stem from a meta-language that uses higher-order abstract syntax augmented by meta-notation for recursive let, contexts, sets of bindings, and chain variables. Additionally, three kinds of constraints can be added to meta-expressions to express usual constraints on evaluation rules and program transformations. Rewriting of meta-expressions is required for automated reasoning on programs and their properties. A concrete application is a procedure to automatically prove correctness of program transformations in higher-order program calculi which may permit recursive let-bindings as they occur in functional programming languages. Rewriting on meta-expressions can be performed by solving the so-called letrec matching problem which we introduce. We provide a matching algorithm to solve it. We show that the letrec matching problem is NP-complete, that our matching algorithm is sound and complete, and that it runs in non-deterministic polynomial time.
This is a short summary of a recent survey [FR03] focusing on the observed evidence, that Internet connectivity is positively correlated with spread of democracy at high levels of significance. The results of multivariate correlation analysis and probabilities regression estimate models are based on the combined analysis of mid - 1991’s, to 2001 data series of the Eurostat’s and US Census Bureau, the World Bank, and OECD’s statistical data service which track the growth of information technology and rating of freedom and democracy worldwide.
We present an implementation of an interpreter LRPi for the call-by-need calculus LRP, based on a variant of Sestoft's abstract machine Mark 1, extended with an eager garbage collector. It is used as a tool for exact space usage analyses as a support for our investigations into space improvements of call-by-need calculi.
50 years of amino acid hydrophobicity scales : revisiting the capacity for peptide classification
(2016)
Background: Physicochemical properties are frequently analyzed to characterize protein-sequences of known and unknown function. Especially the hydrophobicity of amino acids is often used for structural prediction or for the detection of membrane associated or embedded β-sheets and α-helices. For this purpose many scales classifying amino acids according to their physicochemical properties have been defined over the past decades. In parallel, several hydrophobicity parameters have been defined for calculation of peptide properties. We analyzed the performance of separating sequence pools using 98 hydrophobicity scales and five different hydrophobicity parameters, namely the overall hydrophobicity, the hydrophobic moment for detection of the α-helical and β-sheet membrane segments, the alternating hydrophobicity and the exact ß-strand score.
Results: Most of the scales are capable of discriminating between transmembrane α-helices and transmembrane β-sheets, but assignment of peptides to pools of soluble peptides of different secondary structures is not achieved at the same quality. The separation capacity as measure of the discrimination between different structural elements is best by using the five different hydrophobicity parameters, but addition of the alternating hydrophobicity does not provide a large benefit. An in silico evolutionary approach shows that scales have limitation in separation capacity with a maximal threshold of 0.6 in general. We observed that scales derived from the evolutionary approach performed best in separating the different peptide pools when values for arginine and tyrosine were largely distinct from the value of glutamate. Finally, the separation of secondary structure pools via hydrophobicity can be supported by specific detectable patterns of four amino acids.
Conclusion: It could be assumed that the quality of separation capacity of a certain scale depends on the spacing of the hydrophobicity value of certain amino acids. Irrespective of the wealth of hydrophobicity scales a scale separating all different kinds of secondary structures or between soluble and transmembrane peptides does not exist reflecting that properties other than hydrophobicity affect secondary structure formation as well. Nevertheless, application of hydrophobicity scales allows distinguishing between peptides with transmembrane α-helices and β-sheets. Furthermore, the overall separation capacity score of 0.6 using different hydrophobicity parameters could be assisted by pattern search on the protein sequence level for specific peptides with a length of four amino acids.
Magnetoencephalography (MEG) measures neural activity non-invasively and at an excellent temporal resolution. Since its invention (Cohen, 1968, 1972), MEG has proven a most valuable tool in neurocognitive (Salmelin et al., 1994) and clinical research (Stufflebeam et al., 2009; Van ’t Ent et al., 2003). MEG is able to measure rapid changes in electrophysiological neural signals related to sensory and cognitive processes. The magnetic fields measured outside the head by MEG directly reflect the cortical currents generated by the synchronised activity of thousands of neuronal sources. This distinguishes MEG from functional magnetic resonance imaging (fMRI), where measurements are only indirectly related to electrophysiological activity through neurovascular coupling...
Die zunehmende Verbreitung des Internets als universelles Netzwerk zum Transport von Daten aller Art hat in den letzten zwei Dekaden dazu geführt, dass die anfallenden Datenmengen von traditionellen Datenbanksystemen kaum mehr effektiv zu verarbeiten sind. Das liegt zum einen darin, dass ein immer größerer Teil der Erdbevölkerung Zugang zum Internet hat, zum Beispiel via
Internet-fähigen Smartphones, und dessen Dienste nutzen möchte. Zudem tragen immer höhere verfügbare Bandbreiten für den Internetzugang dazu bei, dass die weltweit erzeugten Informationen mittlerweile exponentiell steigen.
Das führte zur Entwicklung und Implementierung von Technologien, um diese immensen Datenmengen wirksam verarbeiten zu können. Diese Technologien können unter dem Sammelbegriff "Big Data" zusammengefasst werden und beschreiben dabei Verfahren, um strukturierte und unstrukturierte Informationen im Tera- und Exabyte-Bereich sogar in Echtzeit verarbeiten zu können. Als Basis dienen dabei Datenbanksysteme, da sie ein bewährtes und praktisches Mittel sind, um Informationen zu strukturieren, zu organisieren, zu manipulieren und effektiv abrufen zu können. Wie bereits erwähnt, hat sich herausgestellt, dass traditionelle Datenbanksysteme, die auf dem relationalen Datenmodell basieren, nun mit Datenmengen konfrontiert sind, mit denen sie nicht sehr gut hinsichtlich der Performance und dem Energieverbrauch skalieren. Dieser Umstand führte zu der Entwicklung von spezialisierten Datenbanksystemen, die andere Daten- und Speichermodelle implementieren und für diese eine deutlich höhere Performance bieten.
Zusätzlich erfordern Datenbanksysteme im Umfeld von "Big Data" wesentlich größere Investitionen in die Anzahl von Servern, was dazu geführt hat, dass immer mehr große und sehr große Datenverarbeitungszentren entstanden sind. In der Zwischenzeit sind die Aufwendungen für Energie zum Betrieb und Kühlen dieser Zentren ein signifikanter Kostenfaktor geworden. Dementsprechend sind bereits Anstrengungen unternommen worden, das Themenfeld Energieeffizienz (die Relation zwischen Performance und Energieverbrauch) von Datenbanksystemen eingehender zu untersuchen.
Mittlerweile sind über 150 Datenbanksysteme bekannt, die ihre eigenen Stärken und Schwächen in Bezug auf Performance, Energieverbrauch und schlussendlich Energieeffizienz haben. Die Endanwender von Datenbanksystemen sehen sich nun in der schwierigen Situation, für einen gegebenen Anwendungsfall das geeigneteste Datenbanksystem in Hinblick auf die genannten Faktoren zu ermitteln. Der Grund dafür ist, dass kaum objektive und unabhängige Vergleichszahlen zur Entscheidungsfindung existieren und dass die Ermittlung von Vergleichszahlen zumeist über die Ausführung von Benchmarks auf verschiedensten technischen Plattformen geschieht. Es ist offensichtlich, dass die mehrfache Ausführung eines Benchmarks mit unterschiedlichsten Parametern (unter anderem die Datenmenge, andere Kombinationen aus technischen Komponenten, Betriebssystem) große Investitionen in Zeit und Technik erfordern, um möglichst breit gefächerte Vergleichszahlen zu erhalten.
Eine Möglichkeit ist es, die Ausführung eines Benchmarks zu simulieren anstatt ihn real zu absolvieren, um die Investitionen in Technik und vor allem Zeit zu minimieren. Diese Simulationen haben auch den Vorteil, dass zum Beispiel die Entwickler von Datenbanksystemen die Auswirkungen auf Performance und Energieeffizienz bei der Änderungen an der Architektur simulieren können anstatt sie durch langwierige Regressionstests evaluieren zu müssen. Damit solche Simulationen eine praktische Relevanz erlangen können, muss natürlich die Differenz zwischen den simulierten und den real gewonnenen Vergleichsmetriken möglichst klein sein. Zudem muss eine geeignete Simulation eine möglichst große Anzahl an Datenbanksystemen und technischen Komponenten nachstellen können.
Die vorliegende Dissertation zeigt, dass eine solche Simulation realistisch ist. Dafür wurde in einem ersten Schritt die Einflussaktoren auf Performance, Energieverbrauch und Energieeffizienz eines Datenbanksystems ermittelt und deren Wirkung anhand von experimentellen Ergebnissen bestimmt. Zusätzlich wurden auch geeignete Metriken und generelle Eigenschaften von Datenbanksystemen und von Benchmarks evaluiert. In einem zweiten Schritt wurde dann ein geeignetes Simulationsmodell erarbeitet und sukzessiv weiterentwickelt. Bei jedem Entwicklungsschritt wurden dann reale Experimente in Form von Benchmarkausführungen für verschiedenste Datenbanksysteme und technische Plattformen durchgeführt. Diese Experimente wurden mittels des Simulationsmodells nachvollzogen, um die Differenz zwischen realen und simulierten Benchmarkergebnissen zu berechnen. Die Ergebnisse des letzten Entwicklungsschrittes zeigen, dass diese Differenz unter acht Prozent liegt. Die vorliegende Dissertation zeigt auch, dass das Simulationsmodell nicht nur dazu geeignet ist, anerkannte Benchmarks zu simulieren, sondern sich im allgemeinen auch dafür eignet, ein Datenbanksystem und die technische Plattform, auf der es ausgeführt wird, generell zu simulieren. Das ermöglicht auch die Simulation anderer Anwendungsfälle, zum Beispiel Regressionstests.