OPUS 4 | Institutes

Institutes

Mathematik (363)
Informatik (1577)

32 search hits

1 to 10

Sort by

Voting for POS tagging of latin texts: using the flair of FLAIR to better ensemble classifiers by example of latin (2020)

Stoeckel, Manuel ; Henlein, Alexander ; Hemati, Wahed ; Mehler, Alexander

Despite the great importance of the Latin language in the past, there are relatively few resources available today to develop modern NLP tools for this language. Therefore, the EvaLatin Shared Task for Lemmatization and Part-of-Speech (POS) tagging was published in the LT4HALA workshop. In our work, we dealt with the second EvaLatin task, that is, POS tagging. Since most of the available Latin word embeddings were trained on either few or inaccurate data, we trained several embeddings on better data in the first step. Based on these embeddings, we trained several state-of-the-art taggers and used them as input for an ensemble classifier called LSTMVoter. We were able to achieve the best results for both the cross-genre and the cross-time task (90.64% and 87.00%) without using additional annotated data (closed modality). In the meantime, we further improved the system and achieved even better results (96.91% on classical, 90.87% on cross-genre and 87.35% on cross-time).

Unconditional reflexive polytopes (2020)

Kohl, Florian ; Olsen, McCabe ; Sanyal, Raman

A convex body is unconditional if it is symmetric with respect to reflections in all coordinate hyperplanes. We investigate unconditional lattice polytopes with respect to geometric, combinatorial, and algebraic properties. In particular, we characterize unconditional reflexive polytopes in terms of perfect graphs. As a prime example, we study the signed Birkhoff polytope. Moreover, we derive constructions for Gale-dual pairs of polytopes and we explicitly describe Gröbner bases for unconditional reflexive polytopes coming from partially ordered sets.

The 𝒮-cone and a primal-dual view on second-order representability (2020)

Naumann, Helen ; Theobald, Thorsten

The 𝒮-cone provides a common framework for cones of polynomials or exponen- tial sums which establish non-negativity upon the arithmetic-geometric inequality, in particular for sums of non-negative circuit polynomials (SONC) or sums of arithmetic- geometric exponentials (SAGE). In this paper, we study the S-cone and its dual from the viewpoint of second-order representability. Extending results of Averkov and of Wang and Magron on the primal SONC cone, we provide explicit generalized second- order descriptions for rational S-cones and their duals.

The p-adic section conjecture for localisations of curves (2020)

Lüdtke, Martin

The $p$-adic section conjecture predicts that for a smooth, proper, hyperbolic curve $X$ over a $p$-adic field $k$, every section of the map of étale fundamental groups $\pi_1(X) \to G_k$ is induced by a unique $k$-rational point of $X$. While this conjecture is still open, the birational variant in which $X$ is replaced by its generic point is known due to Koenigsmann. Generalising an alternative proof of Pop, we extend this result to certain localisations of $X$ at a set of closed points $S$, an intermediate version in between the full section conjecture and its birational variant. As one application, we prove the section conjecture for $X_S$ whenever $S$ is a countable set of closed points.

The impact of current and future control measures on the spread of COVID-19 in Germany (2020)

Barbarossa, Maria Vittoria ; Fuhrmann, Jan ; Meinke, Jan H. ; Krieg, Stefan F. ; Vinod Varma, Hridya ; Castelletti, Noemi ; Lippert, Thomas

The novel coronavirus (SARS-CoV-2), identified in China at the end of December 2019 and causing the disease COVID-19, has meanwhile led to outbreaks all over the globe with about 2.2 million confirmed cases and more than 150,000 deaths as of April 17, 2020 [37]. In view of most recent information on testing activity [32], we present here an update of our initial work [4]. In this work, mathematical models have been developed to study the spread of COVID-19 among the population in Germany and to asses the impact of non-pharmaceutical interventions. Systems of differential equations of SEIR type are extended here to account for undetected infections, as well as for stages of infections and age groups. The models are calibrated on data until April 5, data from April 6 to 14 are used for model validation. We simulate different possible strategies for the mitigation of the current outbreak, slowing down the spread of the virus and thus reducing the peak in daily diagnosed cases, the demand for hospitalization or intensive care units admissions, and eventually the number of fatalities. Our results suggest that a partial (and gradual) lifting of introduced control measures could soon be possible if accompanied by further increased testing activity, strict isolation of detected cases and reduced contact to risk groups.

TextAnnotator: a UIMA based tool for the simultaneous and collaborative annotation of texts (2020)

Abrami, Giuseppe ; Stoeckel, Manuel ; Mehler, Alexander

The annotation of texts and other material in the field of digital humanities and Natural Language Processing (NLP) is a common task of research projects. At the same time, the annotation of corpora is certainly the most time- and cost-intensive component in research projects and often requires a high level of expertise according to the research interest. However, for the annotation of texts, a wide range of tools is available, both for automatic and manual annotation. Since the automatic pre-processing methods are not error-free and there is an increasing demand for the generation of training data, also with regard to machine learning, suitable annotation tools are required. This paper defines criteria of flexibility and efficiency of complex annotations for the assessment of existing annotation tools. To extend this list of tools, the paper describes TextAnnotator, a browser-based, multi-annotation system, which has been developed to perform platform-independent multimodal annotations and annotate complex textual structures. The paper illustrates the current state of development of TextAnnotator and demonstrates its ability to evaluate annotation quality (inter-annotator agreement) at runtime. In addition, it will be shown how annotations of different users can be performed simultaneously and collaboratively on the same document from different platforms using UIMA as the basis for annotation.

Space optimizations in deterministic and concurrent call-by-need functional programming languages (2020)

Dallmeyer, Nils

In this thesis the space consumption and runtime of lazy-evaluating functional programming languages are analyzed. The typed and extended lambda-calculi LRP and CHF* as core languages for Haskell and Concurrent Haskell are used. For each LRP and CHF* compatible abstract machines are introduced. Too lower the distortion of space measurement a classical implementable garbage collector is applied after each LRP reduction step. Die size of expressions and the space measure spmax as maximal size of all garbage-free expressions during an LRP-evaluation, are defined. Program-Transformations are considered as code-to-code transformations. The notions Space Improvement and Space Equivalence as properties of transformations are defined. A Space Improvement does neither change the semantics nor it increases the needed space consumption, for a space equivalence the space consumption is required to remain the same. Several transformations are shown as Space Improvements and Equivalences. An abstract machine for space measurements is introduced. An implementation of this machine is used for more complex space- and runtime-analyses. Total Garbage Collection replaces subexpressions by a non-terminating constant with size zero, if the overall termination is not affected. Thereby the notion of improvement is more independent from the used garbage collector. Analogous to Space Improvements and Equivalences the notions Total Space Improvement and Total Space Equivalence are defined, which use Total Garbage Collection during the space measurement. Several Total Space Improvements and Equivalences are shown. Space measures for CHF* are defined, that are compatible to the space measure of LRP. An algorithm with sort-complexity is developed, that calculates the required space of independent processes that all start and end together. If a constant amount of synchronization restrictions is added and a constant number of processors is used, the runtime is polynomial, if arbitrary synchronizations are used, then the problem is NP-complete. Abstract machines for space- and time-analyses in CHF* are developed and implementations of these are used for space and runtime analyses.

Some results on the theory and practice of algorithms (2020)

Hammer, David

This thesis presents research which spans three conference papers and one manuscript which has not yet been submitted for peer review. The topic of 1 is the inherent complexity of maintaining perfect height in B-trees. We consider the setting in which a B-tree of optimal height contains n = (1−ϵ)N elements where N is the number of elements in full B-tree of the same height (the capacity of the tree). We show that the rebalancing cost when updating the tree—while maintaining optimal height—depends on ϵ. Specifically, our analysis gives a lower bound for the rebalancing cost of Ω(1/(ϵB)). We then describe a rebalancing algorithm which has an amortized rebalancing cost with an almost matching upper bound of O(1/(ϵB)⋅log²(min{1/ϵ,B})). We additionally describe a scheme utilizing this algorithm which, given a rebalancing budget f(n), maintains optimal height for decreasing ϵ until the cost exceeds the budget at which time it maintains optimal height plus one. Given a rebalancing budget of Θ(logn), this scheme maintains optimal height for all but a vanishing fraction of sizes in the intervals between tree capacities. Manuscript 2 presents empirical analysis of practical randomized external-memory algorithms for computing the connected components of graphs. The best known theoretical results for this problem are essentially all derived from results for minimum spanning tree algorithms. In the realm of randomized external-memory MST algorithms, the best asymptotic result has I/O-complexity O(sort(|E|)) in expectation while an empirically studied practical algorithm has a bound of O(sort(|E|)⋅log(|V|/M)). We implement and evaluate an algorithm for connected components with expected I/O-complexity O(sort(|E|))—a simplification of the MST algorithm with this asymptotic cost, we show that this approach may also yield good results in practice. In paper 3, we present a novel approach to simulating large-scale population protocol models. Naive simulation of N interactions of a population protocol with n agents and m states requires Θ(nlogm) bits of memory and Θ(N) time. For very large n, this is prohibitive both in memory consumption and time, as interesting protocols will typically require N > n interactions for convergence. We describe a histogram-based simulation framework which requires Θ(mlogn) bits of memory instead—an improvement as it is typically the case that n ≫ m. We analyze, implement, and compare a number of different data structures to perform correct agent sampling in this regime. For this purpose, we develop dynamic alias tables which allow sampling an interaction in expected amortized constant time. We then show how to use sampling techniques to process agent interactions in batches, giving a simulation approach which uses subconstant time per interaction under reasonable assumptions. With paper 4, we introduce the new model of fragile complexity for comparison-based algorithms. Within this model, we analyze classical comparison-based problems such as finding the minimum value of a set, selection (or finding the median), and sorting. We prove a number of lower and upper bounds and in particular, we give a number of randomized results which describe trade-offs not achievable by deterministic algorithms.

Scalable generation of random graphs (2020)

Penschuck, Manuel

Netzwerkmodelle spielen in verschiedenen Wissenschaftsdisziplinen eine wichtige Rolle und dienen unter anderem der Beschreibung realistischer Graphen. Sie werden häufig als Zufallsgraphen formuliert und stellen somit Wahrscheinlichkeitsverteilungen über Graphen dar. Meist ist die Verteilung dabei parametrisiert und ergibt sich implizit, etwa über eine randomisierten Konstruktionsvorschrift. Ein früher Vertreter ist das G(n,p) Modell, welches über allen ungerichteten Graphen mit n Knoten definiert ist und jede Kante unabhängig mit Wahrscheinlichkeit p erzeugt. Ein aus G(n,p) gezogener Graph hat jedoch kaum strukturelle Ähnlichkeiten zu Graphen, die zumeist in Anwendungen beobachtet werden. Daher sind populäre Modelle so gestaltet, dass sie mit hinreichend hoher Wahrscheinlichkeit gewünschte topologische Eigenschaften erzeugen. Beispielsweise ist es ein gängiges Ziel die nur unscharf definierte Klasse der sogenannten komplexen Netzwerke nachzubilden, der etwa viele soziale Netze zugeordnet werden. Unter anderem verfügen diese Graphen in der Regel über eine Gradverteilung mit schweren Rändern (heavy-tailed), einen kleinen Durchmesser, eine dominierende Zusammenhangskomponente, sowie über überdurchschnittlich dichte Teilbereiche, sogenannte Communities. Die Einsatzmöglichkeiten von Netzwerkmodellen gehen dabei weit über das ursprüngliche Ziel, beobachtete Effekte zu erklären, hinaus. Ein gängiger Anwendungsfall besteht darin, Daten systematisch zu produzieren. Solche Daten ermöglichen oder unterstützen experimentelle Untersuchungen, etwa zur empirischen Verifikation theoretischer Vorhersagen oder zur allgemeinen Bewertung von Algorithmen und Datenstrukturen. Hierbei ergeben sich insbesondere für große Probleminstanzen Vorteile gegenüber beobachteten Netzen. So sind massive Eingaben, die auf echten Daten beruhen, oft nicht in ausreichender Menge verfügbar, nur aufwendig zu beschaffen und zu verwalten, unterliegen rechtlichen Beschränkungen, oder sind von unklarer Qualität. In der vorliegenden Arbeit betrachten wir daher algorithmische Aspekte der Generierung massiver Zufallsgraphen. Um Anwendern Reproduzierbarkeit mit vorhandenen Studien zu ermöglichen, fokussieren wir uns hierbei zumeist auf getreue Implementierungen etablierter Netzwerkmodelle, etwa Preferential Attachment-Prozesse, LFR, simple Graphen mit vorgeschriebenen Gradsequenzen, oder Graphen mit hyperbolischer (o.Ä.) Einbettung. Zu diesem Zweck entwickeln wir praktisch sowie analytisch effiziente Generatoren. Unsere Algorithmen sind dabei jeweils auf ein geeignetes Maschinenmodell hin optimiert. Hierzu entwerfen wir etwa klassische sequentielle Generatoren für Registermaschinen, Algorithmen für das External Memory Model, und parallele Ansätze für verteilte oder Shared Memory-Maschinen auf CPUs, GPUs, und anderen Rechenbeschleunigern.

Requirements engineering and tool-support for security and privacy (2020)

Pape, Sebastian

In order to address security and privacy problems in practice, it is very important to have a solid elicitation of requirements, before trying to address the problem. In this thesis, specific challenges of the areas of social engineering, security management and privacy enhancing technologies are analyzed: Social Engineering: An overview of existing tools usable for social engineering is provided and defenses against social engineering are analyzed. Serious games are proposed as a more pleasant way to raise employees’ awareness and to train them. Security Management: Specific requirements for small and medium sized energy providers are analyzed and a set of tools to support them in assessing security risks and improving their security is proposed. Larger enterprises are supported by a method to collect security key performance indicators for different subsidiaries and with a risk assessment method for apps on mobile devices. Furthermore, a method to select a secure cloud provider – the currently most popular form of outsourcing – is provided. Privacy Enhancing Technologies: Relevant factors for the users’ adoption of privacy enhancing technologies are identified and economic incentives and hindrances for companies are discussed. Privacy by design is applied to integrate privacy into the use cases e-commerce and internet of things.

1 to 10

Open Access

Institutes

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

32 search hits