OPUS 4 | Institutes

Institutes

Mathematik (377)
Informatik (1618)

204 search hits

101 to 150

Sort by

Arbitrage theory in models with transaction costs beyond efficient friction (2021)

In the first part of this thesis, we introduce the concept of prospective strict no-arbitrage for discrete-time financial market models with proportional transaction. The prospective strict no-arbitrage condition, which is a variant of strict no-arbitrage, is slightly weaker than the robust no-arbitrage condition. It still implies that the set of portfolios attainable from zero initial endowment is closed in probability. Consequently, prospective strict no-arbitrage implies the existence of consistent prices, which may lie on the boundary of the bid-ask spread. A weak version of prospective strict no-arbitrage turns out to be equivalent to the existence of a consistent price system. In continuous-time financial market models with proportional transaction costs, efficient friction, i.e., nonvanishing transaction costs, is a standing assumption. Together with robust no free lunch with vanishing risk, it rules out strategies of infinite variation which usually appear in frictionless financial markets. In the second part of this thesis, we show how models with and without transaction costs can be unified. The bid and the ask price of a risky asset are given by cadlag processes which are locally bounded from below and may coincide at some points. In a first step, we show that if the bid-ask model satisfies no unbounded profit with bounded risk for simple long-only strategies, then there exists a semimartingale lying between the bid and the ask price process. In a second step, under the additional assumption that the zeros of the bid-ask spread are either starting points of an excursion away from zero or inner points from the right, we show that for every bounded predictable strategy specifying the amount of risky assets, the semimartingale can be used to construct the corresponding self-financing risk-free position in a consistent way. Finally, the set of most general strategies is introduced, which also provides a new view on the frictionless case.

Impact of rescanning and repositioning on radiomic features employing a multi-object phantom in magnetic resonance imaging (2021)

Bernatz, Simon ; Zhdanovich, Yauheniya ; Ackermann, Jörg ; Koch, Ina ; Wild, Peter Johannes ; Pinto dos Santos, Daniel ; Vogl, Thomas J. ; Kaltenbach, Benjamin ; Rosbach, Nicolas

Our purpose was to analyze the robustness and reproducibility of magnetic resonance imaging (MRI) radiomic features. We constructed a multi-object fruit phantom to perform MRI acquisition as scan-rescan using a 3 Tesla MRI scanner. We applied T2-weighted (T2w) half-Fourier acquisition single-shot turbo spin-echo (HASTE), T2w turbo spin-echo (TSE), T2w fluid-attenuated inversion recovery (FLAIR), T2 map and T1-weighted (T1w) TSE. Images were resampled to isotropic voxels. Fruits were segmented. The workflow was repeated by a second reader and the first reader after a pause of one month. We applied PyRadiomics to extract 107 radiomic features per fruit and sequence from seven feature classes. We calculated concordance correlation coefficients (CCC) and dynamic range (DR) to obtain measurements of feature robustness. Intraclass correlation coefficient (ICC) was calculated to assess intra- and inter-observer reproducibility. We calculated Gini scores to test the pairwise discriminative power specific for the features and MRI sequences. We depict Bland Altmann plots of features with top discriminative power (Mann–Whitney U test). Shape features were the most robust feature class. T2 map was the most robust imaging technique (robust features (rf), n = 84). HASTE sequence led to the least amount of rf (n = 20). Intra-observer ICC was excellent (≥ 0.75) for nearly all features (max–min; 99.1–97.2%). Deterioration of ICC values was seen in the inter-observer analyses (max–min; 88.7–81.1%). Complete robustness across all sequences was found for 8 features. Shape features and T2 map yielded the highest pairwise discriminative performance. Radiomics validity depends on the MRI sequence and feature class. T2 map seems to be the most promising imaging technique with the highest feature robustness, high intra-/inter-observer reproducibility and most promising discriminative power.

An exploratory latent class analysis of student expectations towards learning analytics services (2021)

Whitelock-Wainwright, Alexander ; Tsai, Yi-Shan ; Drachsler, Hendrik ; Scheffel, Maren ; Gašević, Dragan

For service implementations to be widely adopted, it is necessary for the expectations of the key stakeholders to be considered. Failure to do so may lead to services reflecting ideological gaps, which will inadvertently create dissatisfaction among its users. Learning analytics research has begun to recognise the importance of understanding the student perspective towards the services that could be potentially offered; however, student engagement remains low. Furthermore, there has been no attempt to explore whether students can be segmented into different groups based on their expectations towards learning analytics services. In doing so, it allows for a greater understanding of what is and is not expected from learning analytics services within a sample of students. The current exploratory work addresses this limitation by using the three-step approach to latent class analysis to understand whether student expectations of learning analytics services can clearly be segmented, using self-report data obtained from a sample of students at an Open University in the Netherlands. The findings show that student expectations regarding ethical and privacy elements of a learning analytics service are consistent across all groups; however, those expectations of service features are quite variable. These results are discussed in relation to previous work on student stakeholder perspectives, policy development, and the European General Data Protection Regulation (GDPR).

Integration von PartNet in VANNOTATOR unter Gewährleistung von Objektsegmentierungen (2021)

Felden, Tobias

Szenen automatisch aus Texten generieren zu können ist eine interessante Aufgabe der Informatik. Für diese Aufgabe wurde VANNOTATOR (Mehler und Abrami 2019, Abrami, Spiekermann und Mehler 2019, Spiekermann, Abrami und Mehler 2018) entwickelt, ein Framework, das die Beschreibung bzw. Beschriftung von VR-Szenen ermöglicht. Damit für diese Szenen die benötigten 3D-Objekte bereitgestellt werden können, sind entsprechende Datenbanken vonnöten. Diese Datenbanken müssen umfangreich annotiert sein, damit diese Aufgabe bewältigt werden kann. Deshalb wurde im Falle des VANNOTATORs auf die ShapeNetSem Datenbank zurückgegriffen (Abrami, Henlein, Kett u. a. 2020). Je detailreicher eine Szene dargestellt wird, desto detailreicher kann diese auch durch einen Text beschrieben werden. Aus diesem Grund wird die Datenbank um einen Teilbereich von PartNet (Mo u. a. 2019) erweitert. Dieser erlaubt die Option, Objekte zu segmentieren, und erweitert hierdurch das annotierbare Vokabular. Manche der bereits vorhandenen ShapeNetSem-Objekte verfügen über die Eigenschaft, dass sie auch PartNet-Objekte sind. Diese Arbeit befasst sich mit der Umsetzung, wie ShapeNetSem-Objekte mit hinterlegten PartNetObjekten durch diese ersetzt werden können. Um das zu bewerkstelligen, wurde ein Panel entworfen, in welchem ein PartNet-Objekt mit samt seinen einzelnen Segmenten aufgeführt wird. Diese Segmente können nun wie ShapeNetSem-Objekte ausgewählt und in einer Szene platziert werden. Dadurch werden 1.881 Objekte mit wiederum 34.016 Unterobjekten VANNOTATOR zur Verfügung gestellt. Dieses vergrößerte Vokabular hilft Natural Language Processing noch effektiver und präziser voranzutreiben.

Ornstein-Uhlenbeck-Fluktuationen im Linienzählprozess anzestraler Selektionsgraphen (2022)

Weinel, Anna-Lena

Die Arbeit befasst sich mit zwei funktionalen Grenzwertsätzen für skalierte Linienzählprozesse von anzestralen Selektionsgraphen. Dazu werden zwei Modelle aus der mathematischen Populationsgenetik betrachtet. Wir führen zuerst das Moran-Modell mit gerichteter Selektion mit konstanter Populationsgröße N in kontinuierlicher Zeit und den Linienzählprozess des anzestralen Selektionsgraphen (MASP) gemäß Krone und Neuhauser (Theor. Popul. Biol. 1997) ein. Die Hauptaussage dieser Abschlussarbeit besagt, dass der passend standardisierte MASP im Fall der moderaten Selektion für N gegen unendlich in Verteilung gegen einen Ornstein-Uhlenbeck-Prozess konvergiert. Das zweite betrachtete Modell ist das Cannings-Modell mit gerichteter Selektion in diskreter Zeit, das gemäß Boenkost, González Casanova, Pokalyuk und Wakolbinger (Electron. J. Probab. 2021) eingeführt wird. Für ein Teilregime der moderat schwachen Selektion wird bewiesen, dass die reskalierten Fluktuationen des Linienzählprozesses des anzestralen Selektionsgraphen im Cannings-Modell ebenfalls in Verteilung gegen einen Ornstein-Uhlenbeck-Prozess konvergieren.

Unveiling functions of the visual cortex using task-specific deep neural networks (2021)

Dwivedi, Kshitij ; Bonner, Michael F. ; Cichy, Radoslaw Martin ; Roig Noguera, Gemma

Abstract: The human visual cortex enables visual perception through a cascade of hierarchical computations in cortical regions with distinct functionalities. Here, we introduce an AI-driven approach to discover the functional mapping of the visual cortex. We related human brain responses to scene images measured with functional MRI (fMRI) systematically to a diverse set of deep neural networks (DNNs) optimized to perform different scene perception tasks. We found a structured mapping between DNN tasks and brain regions along the ventral and dorsal visual streams. Low-level visual tasks mapped onto early brain regions, 3-dimensional scene perception tasks mapped onto the dorsal stream, and semantic tasks mapped onto the ventral stream. This mapping was of high fidelity, with more than 60% of the explainable variance in nine key regions being explained. Together, our results provide a novel functional mapping of the human visual cortex and demonstrate the power of the computational approach. Author Summary: Human visual perception is a complex cognitive feat known to be mediated by distinct cortical regions of the brain. However, the exact function of these regions remains unknown, and thus it remains unclear how those regions together orchestrate visual perception. Here, we apply an AI-driven brain mapping approach to reveal visual brain function. This approach integrates multiple artificial deep neural networks trained on a diverse set of functions with functional recordings of the whole human brain. Our results reveal a systematic tiling of visual cortex by mapping regions to particular functions of the deep networks. Together this constitutes a comprehensive account of the functions of the distinct cortical regions of the brain that mediate human visual perception.

On fibrations approaching the Arakelov equality (2021)

Bieri, Maximilian

The sum of Lyapunov exponents Lf of a semi-stable fibration is the ratio of the degree of the Hodge bundle by the Euler characteristic of the base. This ratio is bounded from above by the Arakelov inequality. Sheng-Li Tan showed that for fiber genus g≥2 the Arakelov equality is never attained. We investigate whether there are sequences of fibrations approaching asymptotically the Arakelov bound. The answer turns out to be no, if the fibration is smooth, or non-hyperelliptic, or has a small base genus. Moreover, we construct examples of semi-stable fibrations showing that Teichmüller curves are not attaining the maximal possible value of Lf.

Monocular scene flow estimation for dynamic traffic scenes (2021)

Brickwedde, Fabian

The main topic of the present thesis is scene flow estimation in a monocular camera system. Scene flow describes the joint representation of 3D positions and motions of the scene. A special focus is placed on approaches that combine two kinds of information, deep-learning-based single-view depth estimation and model-based multi-view geometry. The first part addresses single-view depth estimation focussing on a method that provides single-view depth information in an advantageous form for monocular scene flow estimation methods. A convolutional neural network, called ProbDepthNet, is proposed, which provides pixel-wise well-calibrated depth distributions. The experiments show that different strategies for quantifying the measurement uncertainty provide overconfident estimates due to overfitting effects. Therefore, a novel recalibration technique is integrated as part of the ProbDepthNet, which is validated to improve the calibration of the uncertainty measures. The monocular scene flow methods presented in the subsequent parts confirm that the integration of single-view depth information results in the best performance if the neural network provides depth distributions instead of single depth values and contains a recalibration. Three methods for monocular scene flow estimation are presented, each one designed to combine multi-view geometry-based optimization with deep learning-based single-view depth estimation such as ProbDepthNet. While the first method, SVD-MSfM, performs the motion and depth estimation as two subsequent steps, the second method, Mono-SF, jointly optimizes the motion estimates and the depth structure. Both methods are tailored to address scenes, where the objects and motions can be represented by a set of rigid bodies. Dynamic traffic scenes are one kind of scenes that essentially fulfill this characteristic. The method, Mono-Stixel, uses an even more specialized scene model for traffic scenes, called stixel world, as underlying scene representation. The proposed methods provide new state of the art for monocular scene flow estimation with Mono-SF being the first and leading monocular method on the KITTI scene flow benchmark at the time of submission of the present thesis. The experiments validate that both kind of information, the multi-view geometric optimization and the single-view depth estimates, contribute to the monocular scene flow estimates and are necessary to achieve the new state of the art accuracy.

Sublinear circuits for polyhedral sets (2021)

Naumann, Helen ; Theobald, Thorsten

Sublinear circuits are generalizations of the affine circuits in matroid theory, and they arise as the convex-combinatorial core underlying constrained non-negativity certificates of exponential sums and of polynomials based on the arithmetic-geometric inequality. Here, we study the polyhedral combinatorics of sublinear circuits for polyhedral constraint sets. We give results on the relation between the sublinear circuits and their supports and provide necessary as well as sufficient criteria for sublinear circuits. Based on these characterizations, we provide some explicit results and enumerations for two prominent polyhedral cases, namely the non-negative orthant and the cube [− 1,1]n.

A fractional Hadamard formula and applications (2021)

Djitte, Sidy Moctar ; Fall, Mouhamed Moustapha ; Weth, Tobias

We derive a shape derivative formula for the family of principal Dirichlet eigenvalues λs(Ω) of the fractional Laplacian (−Δ)s associated with bounded open sets Ω⊂RN of class C1,1. This extends, with a help of a new approach, a result in Dalibard and Gérard-Varet (Calc. Var. 19(4):976–1013, 2013) which was restricted to the case s=12. As an application, we consider the maximization problem for λs(Ω) among annular-shaped domains of fixed volume of the type B∖B¯¯¯¯′, where B is a fixed ball and B′ is ball whose position is varied within B. We prove that λs(B∖B¯¯¯¯′) is maximal when the two balls are concentric. Our approach also allows to derive similar results for the fractional torsional rigidity. More generally, we will characterize one-sided shape derivatives for best constants of a family of subcritical fractional Sobolev embeddings.

Solving an inverse elliptic coefficient problem by convex non-linear semidefinite programming (2021)

Harrach, Bastian von

Several applications in medical imaging and non-destructive material testing lead to inverse elliptic coefficient problems, where an unknown coefficient function in an elliptic PDE is to be determined from partial knowledge of its solutions. This is usually a highly non-linear ill-posed inverse problem, for which unique reconstructability results, stability estimates and global convergence of numerical methods are very hard to achieve. The aim of this note is to point out a new connection between inverse coefficient problems and semidefinite programming that may help addressing these challenges. We show that an inverse elliptic Robin transmission problem with finitely many measurements can be equivalently rewritten as a uniquely solvable convex non-linear semidefinite optimization problem. This allows to explicitly estimate the number of measurements that is required to achieve a desired resolution, to derive an error estimate for noisy data, and to overcome the problem of local minima that usually appears in optimization-based approaches for inverse coefficient problems.

Text2Scene : an interface for object-based processing of part-whole relations (2021)

Kühn, Vincent Roy

Die folgende Arbeit handelt von einer Text2Scene Anwendung, welche in der Virtual Reality (VR) umgesetzt wurde. Das System ermöglicht es den Usern aus einer Beschreibung einer Szene, diese virtuell nachzustellen. Dies bietet eine neue Art der Interaktion mit einem Text, die die visuelle Komponente hervorhebt und somit eine Geschichte auf neue Wege erfahrbar macht. Dazu kann der User einen fertigen Text entweder vom Server zu laden oder einen eigenen erstellen, der dann automatisch verarbeitet wird. Dabei werden die vorhanden physischen Objekte im Text automatisch erkannt und dem User als 3D-Objekte in der virtuellen Umgebung zur Verfügung gestellt. Diese können dann manuell platziert werden und erzeugen dadurch die Szene, die im Ausgangstext beschrieben wurde. Das Ziel der Textverarbeitung ist eine möglichst genaue Beschreibung der Objekte, damit diese zielgerichtet in der Objektdatenbank gesucht werden können. Bei der Textverarbeitung wird besonderer Wert auf das Erkennen von Teil-Ganz Beziehungen gelegt. Sodass Objekte, die im Text vorkommen und ein Holonym besitzen, automatisch mit diesem verknüpft werden. Gleichzeitig wird die Teil-Ganz Beziehung aber auch in die andere Richtung genauer betrachtet. Die Textverarbeitung soll ferner dazu in der Lage sein, Objekte genauer zu spezifizieren und an den Kontext des Textes anzupassen. Weiterhin wurde das Natural Language Processing (NLP) so ausgebaut, dass der Kontext des Textes erkannt wird und die Objekte entsprechend kategorisiert werden. Die Textverarbeitung wird mithilfe eines Neuronalen Netzes implementiert. Die verwendeten Tools zur Erkennung von Teil-Ganz Beziehungen, Kontext und Spezifikation von Objekten wurden anhand von Texteingaben nach der Genauigkeit der Ausgabe evaluiert. Zur Nutzung der Textverarbeitung wurde eine virtuelle Szene entwickelt, die das Erstellen von eigenen Szenen aus vorher geladenen beziehungsweise eingegebenen Texten ermöglicht. Dazu kann der Nutzer manuell oder automatisch Objekte laden lassen, die er dann platzieren kann.

Haldane’s asymptotics for slightly supercritical processes (2021)

Boenkost, Florin

Analysing survival or fixation probabilities for a beneficial allele is a prominent task in the field of theoretical population genetics. Haldane's asymptotics is an approximation for the fixation probability in the case of a single beneficial mutant with small selective advantage in a large population. In this thesis we analyse the interplay between genetic drift and directional selection and prove Haldane's asymptotics in different settings: For the fixation probability in Cannings models with moderate selection and for the survival probability of a slightly supercritical branching processes in a random environment. In Chapter 3 we introduce a class of Cannings models with selection that allow for a forward and backward construction. In particular, a Cannings ancestral selection process can be defined for this class of models, which counts the number of potential parents and is in sampling duality to the forward frequency process. By means of this duality the probability of fixation can be expressed through the expectation of the Cannings ancestral selection process in stationarity. A control of this expectation yields that the fixation probability fulfils Haldane's asymptotics in a regime of moderately weak selection (Thm. 8). In Chapter 4 we study the fixation probability of Cannings models in a regime of moderately strong selection. Here couplings of the frequency process of beneficial individuals with slightly supercritical Galton-Watson processes imply that the fixation probability is given by Haldane's asymptotics (Thm. 9). Lastly, in Chapter 5 we consider slightly supercritical branching processes in an independent and identically distributed random environment and study the probability of survival as the number of expected offspring tends from above to one. We show that only if variance and expectation of the random offspring mean are of the same order the random environment has a non-trivial influence on the probability of survival, which results in a modification of Haldane's asymptotics. Out of the critical parameter regime the population goes extinct or survives with a probability that fulfils Haldane's asymptotics (Thm. 10). The proof establishes an expression for the survival probability in terms of the shape function of the random offspring generating functions. This expression exhibits similarities to perpetuities known from a financial context. Consequently, we prove a limiting theorem for perpetuities with vanishing interest rates (Thm. 11).

Generic methodology for modeling and simulation of vibration, wear and noise in electromechanical drivetrains (2020)

Kolluru, Yashwant

This work describes development of a comprehensive methodology for analyzing vibro-acoustic and wear mechanisms in transmission systems. The thesis addresses certain gaps present in the fields of structure dynamics and abrasion mechanism and opens new areas for further research. The paper attempts to understand new and relatively unexplored challenges like influences of wear on the dynamics of drive train. It also focuses on developing new techniques for analyzing the vibration and acoustic behavior of the drive unit structures and surrounding fluids respectively. The developed methodology meets the requirements of both the complete system and component level modeling by using specially identified combination of different simulation techniques. Based on the created template model, a three-stage spur plus helical gearbox is constructed and simulated as an application example. In addition to the internal mechanical excitation mechanisms, the transmission model also includes the rotational and translational dynamics of the gears, shafts and bearings. It is followed by illustration of wear among the rotating components. Different kinds of static and dynamic analyses are performed and coupled at various levels depending on the mechanical complexities involved. Furthermore, the structure dynamic vibration of the housing and the associated sound particle radiations are mapped into the surrounding fluid. Additionally, the approach for selection of the potential parameters for optimization is depicted. Final part focuses on the measurements of different system states used for validation of the model. In the end, results obtained from both simulations and experiments are analyzed and assessed for there respective performances.

Deutschland knapp vor Spanien : der Mathematik-Didaktiker Matthias Ludwig hat mit fussballmathe.de den Ausgang der EM in Frankreich simuliert: das Ergebnis wird (deutsche) Fußballfans freuen (2016)

Frank, Dirk

Bayesian machine learning for financial modeling (2020)

Nirwan, Rajbir Singh

Machine Learning (ML) is so pervasive in our todays life that we don't even realise that, more often than expected, we are using systems based on it. It is also evolving faster than ever before. When deploying ML systems that make decisions on their own, we need to think about their ignorance of our uncertain world. The uncertainty might arise due to scarcity of the data, the bias of the data or even a mismatch between the real world and the ML-model. Given all these uncertainties, we need to think about how to build systems that are not totally ignorant thereof. Bayesian ML can to some extent deal with these problems. The specification of the model using probabilities provides a convenient way to quantify uncertainties, which can then be included in the decision making process. In this thesis, we introduce the Bayesian ansatz to modeling and apply Bayesian ML models in finance and economics. Especially, we will dig deeper into Gaussian processes (GP) and Gaussian process latent variable model (GPLVM). Applied to the returns of several assets, GPLVM provides the covariance structure and also a latent space embedding thereof. Several financial applications can be build upon the output of the GPLVM. To demonstrate this, we build an automated asset allocation system, a predictor for missing asset prices and identify other structure in financial data. It turns out that the GPLVM exhibits a rotational symmetry in the latent space, which makes it harder to fit. Our second publication reports, how to deal with that symmetry. We propose another parameterization of the model using Householder transformations, by which the symmetry is broken. Bayesian models are changed by reparameterization, if the prior is not changed accordingly. We provide the correct prior distribution of the new parameters, such that the model, i.e. the data density, is not changed under the reparameterization. After applying the reparametrization on Bayesian PCA, we show that the symmetry of nonlinear models can also be broken in the same way. In our last project, we propose a new method for matching quantile observations, which uses order statistics. The use of order statistics as the likelihood, instead of a Gaussian likelihood, has several advantages. We compare these two models and highlight their advantages and disadvantages. To demonstrate our method, we fit quantiled salary data of several European countries. Given several candidate models for the fit, our method also provides a metric to choose the best option. We hope that this thesis illustrates some benefits of Bayesian modeling (especially Gaussian processes) in finance and economics and its usage when uncertainties are to be quantified.

The number of satisfying assignments of random 2-SAT formulas (2021)

Achlioptas, Dimitris ; Coja-Oghlan, Amin ; Hahn-Klimroth, Maximilian Grischa ; Lee, Joon ; Müller, Noëla ; Penschuck, Manuel ; Zhou, Guangyan

We show that throughout the satisfiable phase the normalized number of satisfying assignments of a random 2-SAT formula converges in probability to an expression predicted by the cavity method from statistical physics. The proof is based on showing that the Belief Propagation algorithm renders the correct marginal probability that a variable is set to “true” under a uniformly random satisfying assignment.

On rates of convergence in the probabilistic analysis of algorithms (2021)

Straub, Jasmin

Within the last thirty years, the contraction method has become an important tool for the distributional analysis of random recursive structures. While it was mainly developed to show weak convergence, the contraction approach can additionally be used to obtain bounds on the rate of convergence in an appropriate metric. Based on ideas of the contraction method, we develop a general framework to bound rates of convergence for sequences of random variables as they mainly arise in the analysis of random trees and divide-and-conquer algorithms. The rates of convergence are bounded in the Zolotarev distances. In essence, we present three different versions of convergence theorems: a general version, an improved version for normal limit laws (providing significantly better bounds in some examples with normal limits) and a third version with a relaxed independence condition. Moreover, concrete applications are given which include parameters of random trees, quantities of stochastic geometry as well as complexity measures of recursive algorithms under either a random input or some randomization within the algorithm.

Are we there yet? - A systematic literature review on chatbots in education (2021)

Wollny, Sebastian ; Schneider, Jan ; Di Mitri, Daniele ; Weidlich, Joshua ; Rittberger, Marc ; Drachsler, Hendrik

Chatbots are a promising technology with the potential to enhance workplaces and everyday life. In terms of scalability and accessibility, they also offer unique possibilities as communication and information tools for digital learning. In this paper, we present a systematic literature review investigating the areas of education where chatbots have already been applied, explore the pedagogical roles of chatbots, the use of chatbots for mentoring purposes, and their potential to personalize education. We conducted a preliminary analysis of 2,678 publications to perform this literature review, which allowed us to identify 74 relevant publications for chatbots’ application in education. Through this, we address five research questions that, together, allow us to explore the current state-of-the-art of this educational technology. We conclude our systematic review by pointing to three main research challenges: 1) Aligning chatbot evaluations with implementation objectives, 2) Exploring the potential of chatbots for mentoring students, and 3) Exploring and leveraging adaptation capabilities of chatbots. For all three challenges, we discuss opportunities for future research.

The sketch map tool facilitates the assessment of OpenStreetMap data for participatory mapping (2021)

Klonner, Carolin ; Hartmann, Maximilian ; Dischl, Rebecca ; Djami, Lily ; Anderson, Liana ; Raifer, Martin ; Lima-Silva, Fernanda ; Castro Degrossi, Lívia ; Zipf, Alexander ; Albuquerque, João Porto de

A worldwide increase in the number of people and areas affected by disasters has led to more and more approaches that focus on the integration of local knowledge into disaster risk reduction processes. The research at hand shows a method for formalizing this local knowledge via sketch maps in the context of flooding. The Sketch Map Tool enables not only the visualization of this local knowledge and analyses of OpenStreetMap data quality but also the communication of the results of these analyses in an understandable way. Since the tool will be open-source and several analyses are made automatically, the tool also offers a method for local governments in areas where historic data or financial means for flood mitigation are limited. Example analyses for two cities in Brazil show the functionalities of the tool and allow the evaluation of its applicability. Results depict that the fitness-for-purpose analysis of the OpenStreetMap data reveals promising results to identify whether the sketch map approach can be used in a certain area or if citizens might have problems with marking their flood experiences. In this way, an intrinsic quality analysis is incorporated into a participatory mapping approach. Additionally, different paper formats offered for printing enable not only individual mapping but also group mapping. Future work will focus on advancing the automation of all steps of the tool to allow members of local governments without specific technical knowledge to apply the Sketch Map Tool for their own study areas.

Some results on the theory and practice of algorithms (2020)

Hammer, David

This thesis presents research which spans three conference papers and one manuscript which has not yet been submitted for peer review. The topic of 1 is the inherent complexity of maintaining perfect height in B-trees. We consider the setting in which a B-tree of optimal height contains n = (1−ϵ)N elements where N is the number of elements in full B-tree of the same height (the capacity of the tree). We show that the rebalancing cost when updating the tree—while maintaining optimal height—depends on ϵ. Specifically, our analysis gives a lower bound for the rebalancing cost of Ω(1/(ϵB)). We then describe a rebalancing algorithm which has an amortized rebalancing cost with an almost matching upper bound of O(1/(ϵB)⋅log²(min{1/ϵ,B})). We additionally describe a scheme utilizing this algorithm which, given a rebalancing budget f(n), maintains optimal height for decreasing ϵ until the cost exceeds the budget at which time it maintains optimal height plus one. Given a rebalancing budget of Θ(logn), this scheme maintains optimal height for all but a vanishing fraction of sizes in the intervals between tree capacities. Manuscript 2 presents empirical analysis of practical randomized external-memory algorithms for computing the connected components of graphs. The best known theoretical results for this problem are essentially all derived from results for minimum spanning tree algorithms. In the realm of randomized external-memory MST algorithms, the best asymptotic result has I/O-complexity O(sort(|E|)) in expectation while an empirically studied practical algorithm has a bound of O(sort(|E|)⋅log(|V|/M)). We implement and evaluate an algorithm for connected components with expected I/O-complexity O(sort(|E|))—a simplification of the MST algorithm with this asymptotic cost, we show that this approach may also yield good results in practice. In paper 3, we present a novel approach to simulating large-scale population protocol models. Naive simulation of N interactions of a population protocol with n agents and m states requires Θ(nlogm) bits of memory and Θ(N) time. For very large n, this is prohibitive both in memory consumption and time, as interesting protocols will typically require N > n interactions for convergence. We describe a histogram-based simulation framework which requires Θ(mlogn) bits of memory instead—an improvement as it is typically the case that n ≫ m. We analyze, implement, and compare a number of different data structures to perform correct agent sampling in this regime. For this purpose, we develop dynamic alias tables which allow sampling an interaction in expected amortized constant time. We then show how to use sampling techniques to process agent interactions in batches, giving a simulation approach which uses subconstant time per interaction under reasonable assumptions. With paper 4, we introduce the new model of fragile complexity for comparison-based algorithms. Within this model, we analyze classical comparison-based problems such as finding the minimum value of a set, selection (or finding the median), and sorting. We prove a number of lower and upper bounds and in particular, we give a number of randomized results which describe trade-offs not achievable by deterministic algorithms.

Anpassung von Alignments an Veränderungen in Ontologien (2021)

Jurisch, Matthias

Um Wissen in einer Form abzulegen, in der es automatisiert verarbeitet werden kann, werden unter anderem Ontologien verwendet. Ontologien erlauben über einen als Inferenz bezeichneten Prozess die Ableitung neuen Wissens. Bei inhaltlichen Überschneidungen werden Ontologien über Ontologie-Alignments miteinander verbunden, die Entitäten aus den verschiedenen Ontologien in Beziehung zueinander setzen. Üblicherweise werden diese Alignments als Mengen von Äquivalenzen formuliert, die beschreiben, welche Konzepte aus einer Ontologie Konzepten aus einer anderen Ontologie entsprechen. Ebenfalls verbreitet sind Ober- und Unterklassenbeziehungen in Alignments. Diese Ontologie-Alignments werden zum Beispiel in der Biomedizin in Forschungsdatenbanken verwendet, da durch Alignments Informationen aus verschiedenen Bereichen zusammengeführt werden können. Der manuelle Aufwand, um große Ontologien und Alignments zu erstellen, ist sehr hoch. Dementsprechend wäre es wünschenswert, bei einer Veränderung von Ontologien nicht wieder von vorne beginnen und eine neue Ontologie erstellen zu müssen und möglichst viel aus der veränderten Ontologie und den die Ontologie betreffenden Alignments wiederverwenden zu können. Daher sollten möglichst automatisierte Verfahren verwendet werden. Diese Arbeit untersucht vier Ansätze, um die Anpassung von Alignments an Veränderungen in Ontologien zu automatisieren. Der erste Ansatz bezieht Inferenzen in den Prozess zur Vorhersage von Alignment-Änderungen mit ein. Dazu werden die Inferenzen vor und nach der Änderung der Ontologien berechnet und auf Basis der Unterschiede mit einem regelbasierten Algorithmus bestimmt, wie sich das Alignment ändern soll. Der zweite Ansatz, wie auch die weiteren Ansätze, hat nicht zum Ziel das Alignment direkt anzupassen. Stattdessen soll vorhergesagt werden, welche Teile des Alignments angepasst werden müssen. Dazu werden die Ontologien und das Alignment als Wissensgraph-Embeddings repräsentiert. Diese Embeddings bilden Knoten aus den Ontologien in einen Raum mit 300-1000 Dimensionen so ab, dass in dem Raum auch die Beziehungen zwischen den Entitäten der Ontologien repräsentiert werden können. Diese Embeddings werden dann verwendet, um verschiedene Klassifikationsalgorithmen zu trainieren. Auf diese Weise wird vorhergesagt, welche Teile des Alignments sich verändern werden. Der dritte Ansatz verbindet Embeddings mit einem Veränderungsmodell. Das Veränderungsmodell kategorisiert die an den Ontologien vorgenommenen Veränderungen. Auf diese Kategorisierung und das Embedding werden dann Klassifikationsalgorithmen angewandt. Der vierte Ansatz verwendet eine speziell auf Wissensgraphen ausgerichtete Architektur für neuronale Netze, sogenannte Graph Convolutional Networks, um Veränderungen an Alignments vorher zu sagen. Diese Ansätze werden auf ihre jeweiligen Vor- und Nachteile untersucht. Dazu werden die Verfahren an zwei Anwendungsfällen untersucht. Der Ansatz zur regelbasierten Einbeziehung von Inferenzen wird anhand eines Anwendungsbeispiels aus dem Bereich der Interweaving Systems betrachtet. In dem Beispiel wird eine allgemeine Methode für Interweaving Systems angewandt um das Selbstmanagement von Ampelsteuerungen zu ermöglichen. Die auf maschinellem Lernen aufbauenden Ansätze werden auf einem Auszug aus der biomedizinischen Forschungsdatenbank UMLS evaluiert. Dabei konnte festgestellt werden, dass die betrachteten Ansätze grundsätzlich zur Anpassung von Alignments an Ontologie-Veränderungen eingesetzt werden können. Der Ansatz zur regelbasierten Einbeziehung von Inferenzen kann dabei vor allem auf sehr kleinen Datensätzen eingesetzt werden, bei denen alle Gesetzmäßigkeiten der Veränderungen grundsätzlich bekannt sind. Diese Anwendbarkeit ergibt sich aus dem Entwurf der Problemstellung für den ersten Ansatz. Die auf maschinellem Lernen aufbauenden Ansätze eignen sich besonders für große Datensätze und bieten den Vorteil, dass auch ohne ein vollständiges Verständnis des Veränderungsprozesses Vorhersagen getroffen werden können. Unter den Ansätzen, die maschinelles Lernen einsetzen, zeigte die Einbeziehung von Veränderungsmodellen keine Vorteile gegenüber den anderen Ansätzen. Auf einem etwas kleineren Datensatz waren die Ergebnisse des Embedding-basierten Ansatzes und der Relational Graph Convolutional Networks vergleichbar, während auf einem größeren Datensatz die Graph Convolutional Networks etwas bessere Ergebnisse erreichen konnten. Weitere Ergebnisse dieser Arbeit stellen eine Formalisierung der Problemstellung der Anpassung von Ontologie-Alignments an Veränderungen sowie eine formale Darstellung der Ansätze dar. Ein weiterer Beitrag der Arbeit ist die Vorstellung eines Anwendungsfalls aus dem Bereich der Interweaving Systems für Ontologie-Alignments. Außerdem wurde das Problem der Anpassung von Alignments an Veränderungen so formuliert, dass es mithilfe von maschinellem Lernen betrachtet werden kann.

Principles of cognitive maps (2021)

Živković, Đurđica

This thesis analyses the concept of a cognitive map in the research fields of geography. Cognitive mapping research is essential as it investigates the relations between cognitive maps and external representations of space that people regularly use by acquiring spatial knowledge, such as maps in geographic information systems. Moreover, cognitive maps, when expanded on semantic maps, explain the relations between people and things in a non-physically environment, where the considered space is not spanned by distance but with other non-spatially variables. Nevertheless, cognitive maps are often distorted. Although a good formation of a cognitive map is vital in navigation processes, cognitive distortions are barely investigated in the field of geography. By analyzing the relevant work, especially Tobler’s first law of geography, a new lexical variant of Tobler’s first law could be stated that could presumably describe a specific distortion in the processing of landmarks in cognitive maps.

Teaching from a distance - math lessons during COVID-19 in Germany and Spain (2021)

Barlovits, Simon ; Jablonski, Simone ; Lázaro, Claudia ; Ludwig, Matthias ; Recio, Tomas

In 2020, Germany and Spain experienced lockdowns of their school systems. This resulted in a new challenge for learners and teachers: lessons moved from the classroom to the children’s homes. Therefore, teachers had to set rules, implement procedures and make didactical–methodical decisions regarding how to handle this new situation. In this paper, we focus on the roles of mathematics teachers in Germany and Spain. The article first describes how mathematics lessons were conducted using distance learning. Second, problems encountered throughout this process were examined. Third, teachers drew conclusions from their mathematics teaching experiences during distance learning. To address these research interests, a questionnaire was answered by N = 248 teachers (N1 = 171 German teachers; N2 = 77 Spanish teachers). Resulting from a mixed methods approach, differences between the countries can be observed, e.g., German teachers conducted more lessons asynchronously. In contrast, Spanish teachers used synchronous teaching more frequently, but still regard the lack of personal contact as a main challenge. Finally, for both countries, the digitization of mathematics lessons seems to have been normalized by the pandemic.

Designing deep neural networks for continual learning in an open world (2021)

Mundt, Martin

Deep learning with neural networks seems to have largely replaced traditional design of computer vision systems. Automated methods to learn a plethora of parameters are now used in favor of previously practiced selection of explicit mathematical operators for a specific task. The entailed promise is that practitioners no longer need to take care of every individual step, but rather focus on gathering big amounts of data for neural network training. As a consequence, both a shift in mindset towards a focus on big datasets, as well as a wave of conceivable applications based exclusively on deep learning can be observed. This PhD dissertation aims to uncover some of the only implicitly mentioned or overlooked deep learning aspects, highlight unmentioned assumptions, and finally introduce methods to address respective immediate weaknesses. In the author’s humble opinion, these prevalent shortcomings can be tied to the fact that the involved steps in the machine learning workflow are frequently decoupled. Success is predominantly measured based on accuracy measures designed for evaluation with static benchmark test sets. Individual machine learning workflow components are assessed in isolation with respect to available data, choice of neural network architecture, and a particular learning algorithm, rather than viewing the machine learning system as a whole in context of a particular application. Correspondingly, in this dissertation, three key challenges have been identified: 1. Choice and flexibility of a neural network architecture. 2. Identification and rejection of unseen unknown data to avoid false predictions. 3. Continual learning without forgetting of already learned information. These latter challenges have already been crucial topics in older literature, alas, seem to require a renaissance in modern deep learning literature. Initially, it may appear that they pose independent research questions, however, the thesis posits that the aspects are intertwined and require a joint perspective in machine learning based systems. In summary, the essential question is thus how to pick a suitable neural network architecture for a specific task, how to recognize which data inputs belong to this context, which ones originate from potential other tasks, and ultimately how to continuously include such identified novel data in neural network training over time without overwriting existing knowledge. Thus, the central emphasis of this dissertation is to build on top of existing deep learning strengths, yet also acknowledge mentioned weaknesses, in an effort to establish a deeper understanding of interdependencies and synergies towards the development of unified solution mechanisms. For this purpose, the main portion of the thesis is in cumulative form. The respective publications can be grouped according to the three challenges outlined above. Correspondingly, chapter 1 is focused on choice and extendability of neural network architectures, analyzed in context of popular image classification tasks. An algorithm to automatically determine neural network layer width is introduced and is first contrasted with static architectures found in the literature. The importance of neural architecture design is then further showcased on a real-world application of defect detection in concrete bridges. Chapter 2 is comprised of the complementary ensuing questions of how to identify unknown concepts and subsequently incorporate them into continual learning. A joint central mechanism to distinguish unseen concepts from what is known in classification tasks, while enabling consecutive training without forgetting or revisiting older classes, is proposed. Once more, the role of the chosen neural network architecture is quantitatively reassessed. Finally, chapter 3 culminates in an overarching view, where developed parts are connected. Here, an extensive survey further serves the purpose to embed the gained insights in the broader literature landscape and emphasizes the importance of a common frame of thought. The ultimately presented approach thus reflects the overall thesis’ contribution to advance neural network based machine learning towards a unified solution that ties together choice of neural architecture with the ability to learn continually and the capability to automatically separate known from unknown data.

Additive kinematic formulas for flag area measures (2021)

Abardia-Evéquoz, Judit ; Bernig, Andreas

We show the existence of additive kinematic formulas for general flag area measures, which generalizes a recent result by Wannerer. Building on previous work by the second named author, we introduce an algebraic framework to compute these formulas explicitly. This is carried out in detail in the case of the incomplete flag manifold consisting of all (p+1)-planes containing a unit vector.

Statistical inference on random factor graphs (2021)

Loick, Philipp Johannes Michael

Masur–Veech volume of the gothic locus (2020)

Torres-Teigell, David

We calculate the Masur–Veech volume of the gothic locus G in the stratum H(23) of genus 4. Our method is based on the use of the formulae for the Euler characteristics of gothic Teichmu ̈ller curves to determine the number of lattice points of given area. We also use this method to recal- culate the Masur–Veech volumes of the Prym loci P3 ⊂ H(4) and P4 ⊂ H(6) in genus 3 and 4.

Towards automatic collaboration analytics for group speech data using learning analytics (2021)

Praharaj, Sambit ; Scheffel, Maren ; Schmitz, Marcel ; Specht, Marcus ; Drachsler, Hendrik

Collaboration is an important 21st Century skill. Co-located (or face-to-face) collaboration (CC) analytics gained momentum with the advent of sensor technology. Most of these works have used the audio modality to detect the quality of CC. The CC quality can be detected from simple indicators of collaboration such as total speaking time or complex indicators like synchrony in the rise and fall of the average pitch. Most studies in the past focused on “how group members talk” (i.e., spectral, temporal features of audio like pitch) and not “what they talk”. The “what” of the conversations is more overt contrary to the “how” of the conversations. Very few studies studied “what” group members talk about, and these studies were lab based showing a representative overview of specific words as topic clusters instead of analysing the richness of the content of the conversations by understanding the linkage between these words. To overcome this, we made a starting step in this technical paper based on field trials to prototype a tool to move towards automatic collaboration analytics. We designed a technical setup to collect, process and visualize audio data automatically. The data collection took place while a board game was played among the university staff with pre-assigned roles to create awareness of the connection between learning analytics and learning design. We not only did a word-level analysis of the conversations, but also analysed the richness of these conversations by visualizing the strength of the linkage between these words and phrases interactively. In this visualization, we used a network graph to visualize turn taking exchange between different roles along with the word-level and phrase-level analysis. We also used centrality measures to understand the network graph further based on how much words have hold over the network of words and how influential are certain words. Finally, we found that this approach had certain limitations in terms of automation in speaker diarization (i.e., who spoke when) and text data pre-processing. Therefore, we concluded that even though the technical setup was partially automated, it is a way forward to understand the richness of the conversations between different roles and makes a significant step towards automatic collaboration analytics.

Large discrete structures : statistical inference, combinatorics and limits (2021)

Hahn-Klimroth, Maximilian Grischa

Studying large discrete systems is of central interest in, non-exclusively, discrete mathematics, computer sciences and statistical physics. The study of phase transitions, e.g. points in the evolution of a large random system in which the behaviour of the system changes drastically, became of interest in the classical field of random graphs, the theory of spin glasses as well as in the analysis of algorithms [78,82, 121]. It turns out that ideas from the statistical physics’ point of view on spin glass systems can be used to study inherently combinatorial problems in discrete mathematics and theoretical computer sciences(for instance, satisfiability) or to analyse phase transitions occurring in inference problems (like the group testing problem) [68, 135, 168]. A mathematical flaw of this approach is that the physical methods only render mathematical conjectures as they are not known to be rigorous. In this thesis, we will discuss the results of six contributions. For instance, we will explore how the theory of diluted mean-field models for spin glasses helps studying random constraint satisfaction problems through the example of the random 2−SAT problem. We will derive a formula for the number of satisfying assignments that a random 2−SAT formula typically possesses [2]. Furthermore, we will discuss how ideas from spin glass models (more precisely, from their planted versions) can be used to facilitate inference in the group testing problem. We will answer all major open questions with respect to non-adaptive group testing if the number of infected individuals scales sublinearly in the population size and draw a complete picture of phase transitions with respect to the complexity and solubility of this inference problem [41, 46]. Subsequently, we study the group testing problem under sparsity constrains and obtain a (not fully understood) phase diagram in which only small regions stay unexplored [88]. In all those cases, we will discover that important results can be achieved if one combines the rich theory of the statistical physics’ approach towards spin glasses and inherent combinatorial properties of the underlying random graph. Furthermore, based on partial results of Coja-Oghlan, Perkins and Skubch [42] and Coja-Oghlan et al. [49], we introduce a consistent limit theory for discrete probability measures akin to the graph limit theory [31, 32, 128] in [47]. This limit theory involves the extensive study of a special variant of the cut-distance and we obtain a continuous version of a very simple algorithm, the pinning operation, which allows to decompose the phase space of an underlying system into parts such that a probability measure, restricted to this decomposition, is close to a product measure under the cut-distance. We will see that this pinning lemma can be used to rigorise predictions, at least in some special cases, based on the physical idea of a Bethe state decomposition when applied to the Boltzmann distribution. Finally, we study sufficient conditions for the existence of perfect matchings, Hamilton cycles and bounded degree trees in randomly perturbed graph models if the underlying deterministic graph is sparse [93].

First passage percolation in the mean field limit (2020)

Schertzer, Adrien

This dissertation deals with two classical problems in statistical mechanics: the first passage percolation on Euclidean spaces, FPP for short, in both directed and undirected settings.

Scalable generation of random graphs (2020)

Penschuck, Manuel

Netzwerkmodelle spielen in verschiedenen Wissenschaftsdisziplinen eine wichtige Rolle und dienen unter anderem der Beschreibung realistischer Graphen. Sie werden häufig als Zufallsgraphen formuliert und stellen somit Wahrscheinlichkeitsverteilungen über Graphen dar. Meist ist die Verteilung dabei parametrisiert und ergibt sich implizit, etwa über eine randomisierten Konstruktionsvorschrift. Ein früher Vertreter ist das G(n,p) Modell, welches über allen ungerichteten Graphen mit n Knoten definiert ist und jede Kante unabhängig mit Wahrscheinlichkeit p erzeugt. Ein aus G(n,p) gezogener Graph hat jedoch kaum strukturelle Ähnlichkeiten zu Graphen, die zumeist in Anwendungen beobachtet werden. Daher sind populäre Modelle so gestaltet, dass sie mit hinreichend hoher Wahrscheinlichkeit gewünschte topologische Eigenschaften erzeugen. Beispielsweise ist es ein gängiges Ziel die nur unscharf definierte Klasse der sogenannten komplexen Netzwerke nachzubilden, der etwa viele soziale Netze zugeordnet werden. Unter anderem verfügen diese Graphen in der Regel über eine Gradverteilung mit schweren Rändern (heavy-tailed), einen kleinen Durchmesser, eine dominierende Zusammenhangskomponente, sowie über überdurchschnittlich dichte Teilbereiche, sogenannte Communities. Die Einsatzmöglichkeiten von Netzwerkmodellen gehen dabei weit über das ursprüngliche Ziel, beobachtete Effekte zu erklären, hinaus. Ein gängiger Anwendungsfall besteht darin, Daten systematisch zu produzieren. Solche Daten ermöglichen oder unterstützen experimentelle Untersuchungen, etwa zur empirischen Verifikation theoretischer Vorhersagen oder zur allgemeinen Bewertung von Algorithmen und Datenstrukturen. Hierbei ergeben sich insbesondere für große Probleminstanzen Vorteile gegenüber beobachteten Netzen. So sind massive Eingaben, die auf echten Daten beruhen, oft nicht in ausreichender Menge verfügbar, nur aufwendig zu beschaffen und zu verwalten, unterliegen rechtlichen Beschränkungen, oder sind von unklarer Qualität. In der vorliegenden Arbeit betrachten wir daher algorithmische Aspekte der Generierung massiver Zufallsgraphen. Um Anwendern Reproduzierbarkeit mit vorhandenen Studien zu ermöglichen, fokussieren wir uns hierbei zumeist auf getreue Implementierungen etablierter Netzwerkmodelle, etwa Preferential Attachment-Prozesse, LFR, simple Graphen mit vorgeschriebenen Gradsequenzen, oder Graphen mit hyperbolischer (o.Ä.) Einbettung. Zu diesem Zweck entwickeln wir praktisch sowie analytisch effiziente Generatoren. Unsere Algorithmen sind dabei jeweils auf ein geeignetes Maschinenmodell hin optimiert. Hierzu entwerfen wir etwa klassische sequentielle Generatoren für Registermaschinen, Algorithmen für das External Memory Model, und parallele Ansätze für verteilte oder Shared Memory-Maschinen auf CPUs, GPUs, und anderen Rechenbeschleunigern.

Regularising linear inverse problems under unknown non-Gaussian noise (2020)

Jahn, Tim Nikolas

Diese Arbeit beschäftigt sich mit linearen inversen Problemen, wie sie in einer Vielzahl an Anwendungen auftreten. Diese Probleme zeichnen sich dadurch aus, dass sie typischerweise schlecht gestellt sind, was in erster Linie die Stabilität betrifft. Selbst kleinste Messfehler haben enorme Konsequenzen für die Rekonstruktion der zu bestimmenden Größe. Um eine robuste Rekonstruktion zu ermöglichen, muss das Problem regularisiert, dass heißt durch eine ganze Familie abgeänderter, stabiler Approximationen ersetzt werden. Die konkrete Wahl aus der Familie, die sogenannte Parameterwahlstrategie, stützt sich dann auf zusätzliche ad hoc Annahmen über den Messfehler. Typischerweise ist dies im deterministischen Fall die Kenntnis einer oberen Schranke an die Norm des Datenfehlers, oder im stochastischen Fall, die Kenntnis der Verteilung des Fehlers, beziehungsweise die Einschränkung auf eine bestimmte Klasse von Verteilungen, zumeist Gaußsche. In der vorliegenden Arbeit wird untersucht, wie sich diese Informationen unter der Annahme der Wiederholbarkeit der Messung gewinnen lassen. Die Daten werden dabei aus mehreren Messungen gemittelt, welche einer beliebigen, unbekannten Verteilung folgen, wobei die zur Lösung des Problems unweigerlich notwendige Fehlerschranke geschätzt wird. Auf Mittelwert und Schätzer wird dann ein klassisches Regularisierungsverfahren angewandt. Als Regularisierungen werden größtenteils Filter-basierte Verfahren behandelt, die sich auf die Spektralzerlegung des Problems stützen. Als Parameterwahlstrategien werden sowohl einfache a priori-Wahlen betrachtet, als auch das Diskrepanzprinzip als adaptives Verfahren. Es wird Konvergenz für unbekannte beliebige Fehlerverteilungen mit endlicher Varianz sowie für Weißes Rauschen (bezüglich allgemeiner Diskretisierungen) nachgewiesen. Schließlich wird noch die Konvergenz des Diskrepanzprinzips für ein stochastisches Gradientenverfahren gezeigt, als erste rigorose Analyse einer adaptiven Stoppregel für ein solches nicht Filter-basiertes Regularisierungsverfahren.

Ordnung des Fachbereichs Informatik und Mathematik der Johann Wolfgang Goethe-Universität Frankfurt am Main für den Masterstudiengang Informatik mit dem Abschluss „Master of Science (M. Sc.)“ vom 17. Juni 2019 : hier: Änderung vom 19. Januar 2021 ; genehmigt vom Präsidium am 9. März 2021 (2021)

Ordnung des Fachbereichs Informatik und Mathematik der Johann Wolfgang Goethe-Universität Frankfurt am Main für den Bachelorstudiengang Informatik mit dem Abschluss „Bachelor of Science (B. Sc.)“ vom 17. Juni 2019 : hier: Änderung vom 19. Januar 2021 ; genehmigt vom Präsidium am 9. März 2021 (2021)

MathCityMap - eine Bildungs-App für mathematische Wanderpfade : theoretische Grundlagen, Entwicklung und Evaluation (2020)

Gurjanow, Iwan

Diese Arbeit beschäftigt sich mit der theoriegeleiteten Entwicklung eines digitalen Werkzeugs namens MathCityMap (MCM) für das außerschulische Lehren und Lernen von Mathematik. Den Ausgangspunkt des Projekts bilden die sogenannten Mathtrails. Dies sind Wanderpfade zum Entdecken mathematischer Sachverhalte an realen Objekten in der Umwelt. Eine didaktische, methodische sowie lernpsychologische Analyse konstatiert Mathtrails zahlreiche Potentiale für den Lernprozess wie beispielsweise die Möglichkeit, Primärerfahrungen zu sammeln, das Interesse am Fach Mathematik zu steigern sowie das Lernen aktiv und konstruktiv zu gestalten. Trotz der genannten Vorteile wird deutlich, dass die Vorbereitung und Umsetzung der mathematischen Wanderpfade mit einem immensen Aufwand verbunden sind. Eine weitere Herausforderung für Lernende liegt im offenen Charakter der Mathtrails, die in der Regel in autonomen Kleingruppen abgelaufen werden. Aus der Literatur ist bekannt, dass insbesondere für schwächere Lerner die Gefahr besteht, durch die Anforderungen einer selbstständigen Arbeitsweise überfordert zu werden. Als Lösungsansatz für die zuvor genannten Probleme wird im Rahmen dieser Arbeit die Entwicklung eines digitalen Werkzeugs für Mathtrails erläutert. Die erste Forschungsfrage beschäftigt sich mit den theoretischen Anforderungen an solch ein Tool: 1. Welchen Anforderungen muss ein digitales Werkzeug genügen, um die Vorzüge der Mathtrails zu erhalten, deren Aufwand zu minimieren und die Gefahren zu kompensieren? Unter Berücksichtigung der theoretischen Grundlagen digitaler Werkzeuge und des „Mobile Learnings“ werden zunächst Möglichkeiten identifiziert, den Vorbereitungsaufwand zu minimieren. Konkret erscheinen die automatische Datenverarbeitung, das digitale Zusammen-arbeiten sowie das Teilen und Wiederverwenden von digitalen Aufgaben und Trails als theoretisch zielführende Bestandteile von MCM. Weiterhin sollen zur Unterstützung der Lerner bei der eigenständigen Bearbeitung von Mathtrails didaktisch bewährte Konzepte – wie gestufte Hilfestellungen und Feedback – eingesetzt werden. Vor dem Hintergrund der soeben formulierten Anforderungen bilden der Entwicklungsprozess sowie die Beschreibung des aktuellen Ist-Zustandes des MCM-Systems zentrale Bestand-teile dieser Arbeit. Das System setzt sich aus zwei Komponenten für jeweils unterschiedliche Zielgruppen zusammen: das MCM-Webportal zum Erstellen von Mathtrails und die MCM-App zum Ablaufen selbiger. Die Hauptziele von MCM können in der Minimierung des Vorbereitungsaufwands sowie der Kompensation einer Überforderungsgefahr gesehen werden. In ersten Feldversuchen konnte MCM bereits in einem frühen Stadium erfolgreich mit Lernenden der Sekundarstufe I getestet werden. Gleichzeitig fiel jedoch auf, dass das implementierte Feedback-System Schwächen aufwies und von Lernenden zum systematischen Erraten von Lösungen genutzt werden konnte. In der Folge wurden Spielelemente (Gamification), denen nicht nur eine motivationssteigernde Wirkung nachgesagt wird, sondern auch das Potential das Verhalten zu beeinflussen, Bestandteil der MCM-App. Die zweite Forschungs-frage dieser Arbeit zielt auf die Auswirkungen der Gamification-Integration ab und lautet: 2. Welchen Einfluss haben Gamification-Elemente auf die Motivation sowie auf das Nutzungs-verhalten des digitalen Werkzeugs von Neuntklässlern bei der Bearbeitung eines Mathtrails? Zur Beantwortung der zweiten Forschungsfrage wurde eine empirische Studie mit 16 Schulklassen (304 Schülerinnen und Schüler) der neunten Jahrgangsstufe im Sommer 2017 durch-geführt. Die Ergebnisse können wie folgt zusammengefasst werden: Die Implementierung einer Rangliste (Leaderboard) in die MCM-App führte zwar nicht zu einer höheren Motivation, jedoch spornte der Wettbewerb die Teilnehmer an, viele Aufgaben zu bearbeiten. Im Ver-gleich zu der Kontrollgruppe ohne Gamification-Elemente löste die Experimentalgruppe signifikant mehr Aufgaben, legte die doppelte Strecke zurück und nutzte das Feedbacks-System seltener aus, um Lösungen zu erraten. Die Studie konnte empirisch den gewünschten Einfluss von Spielelementen auf die Benutzung eines digitalen Werkzeugs für das außerschulische Lernen von Mathematik aufzeigen. Die Evaluation der Ziele von MCM erfolgt indirekt über die Analyse der Verbreitung der Mathtrail-Idee ohne MCM und mit MCM. Die dritte Forschungsfrage lautet dementsprechend: 3. Welchen Beitrag hat das digitale Werkzeug zur Verbreitung der Mathtrail-Idee nach 4 Jahren Projektlaufzeit geleistet? Zur Beantwortung der dritten Forschungsfrage werden wissenschaftliche Publikationen zu Mathtrails analysiert. Es wird insbesondere in Publikationen mit und ohne Stichwort „MathCityMap“ unterschieden, um eine Aussage über den Einfluss des MCM-Projekts auf den wissenschaftlichen Diskurs treffen zu können. Stand August 2020 enthält bereits jede dritte Mathtrail-Publikation einen Bezug zu MCM. Weiterhin wird ein Vergleich zu vorherigen, ähnlichen Bemühungen – gemeint sind Online-Mitmach-Projekte für Mathtrails – gezogen. So existierten im Zeitraum 2000 bis 2010 im anglo-amerikanischen Raum erste Webseiten für mathematische Wanderpfade. Diese boten zusammengenommen 131 Mathtrails an. Im Vergleich hierzu existieren bereits über 2.500 MCM-Mathtrails in 57 Ländern. Sowohl die Publikationen als auch die Anzahl der erstellten Trails stellen erste Indizien dafür dar, dass mit MCM die Realisation eines theoretischen Konzepts für ein digitales Mathtrail-Werkzeug gelungen ist und die Idee der Mathtrails verbreitet werden konnte.

Educational texttechnology : quantifying task descriptions in a multilingual setting (2020)

Konca, Maksims

This thesis explores a variety of methods of text quantification applicable in the field of educational text technology. Besides the cohort of existing linguistic, lexical, syntactic, and semantic text quantification methods, additional methods based on Bidirectional Encoder Representations from Transformers (BERT) are introduced and analysed. The model, developed in this thesis, is tested on a multilingual data composed of task descriptions used in Test of Understanding in College Economics (TUCE). Quantitative features extracted from raw textual data are analysed using an array of evaluation methods with the goal of finding the best predictors of the target variable - the rate of correct student responses in TUCE.

Requirements engineering and tool-support for security and privacy (2020)

Pape, Sebastian

In order to address security and privacy problems in practice, it is very important to have a solid elicitation of requirements, before trying to address the problem. In this thesis, specific challenges of the areas of social engineering, security management and privacy enhancing technologies are analyzed: Social Engineering: An overview of existing tools usable for social engineering is provided and defenses against social engineering are analyzed. Serious games are proposed as a more pleasant way to raise employees’ awareness and to train them. Security Management: Specific requirements for small and medium sized energy providers are analyzed and a set of tools to support them in assessing security risks and improving their security is proposed. Larger enterprises are supported by a method to collect security key performance indicators for different subsidiaries and with a risk assessment method for apps on mobile devices. Furthermore, a method to select a secure cloud provider – the currently most popular form of outsourcing – is provided. Privacy Enhancing Technologies: Relevant factors for the users’ adoption of privacy enhancing technologies are identified and economic incentives and hindrances for companies are discussed. Privacy by design is applied to integrate privacy into the use cases e-commerce and internet of things.

Automatische Disambiguation für das Englische (2018)

Schulz, Clemens

Begriffe sind häufig nicht eindeutig. Eine „Bank“ kann ein Finanzinstitut oder eine Sitzgelegenheit sein und die Stadt Frankfurt existiert mehr als einmal. Dennoch können sie in vielen Fällen problemlos von Menschen unterschieden werden. Computer sind noch nicht in der Lage, diese Leistung mit vergleichbarer Genauigkeit zu erfüllen. Der in dieser Arbeit vorgestellte Ansatz baut auf dem für das Deutsche bereits gute Ergebnisse erzielenden fastSense auf und verwendet ein neuronales Netz, um Namen und Begriffe in englischen Texten mit Hilfe der Wikipedia zu disambiguieren. Dabei konnte eine Genauigkeit von bis zu 89,5% auf Testdaten erreicht werden. Mit dem entwickelten Python-Modul kann das trainierte Modell in bestehende Anwendungen eingebunden werden. Die im Modul enthaltenen Programme ermöglichen es, neue Modelle zu trainieren und zu testen.

Deutschland startet als Mitfavorit in die Fußball-WM : Matthias Ludwig hat mit seinem Projekt fussballmathe.de die Prognosen aller Teams berechn (2018)

IsoSpace basierter Avatareditor in Virtual Reality als Teil des VAnnotatoR (2020)

Dächert, Paul

In der aktuellen Zeit gibt es eine Vielzahl an annotierten Texten und anderen Medien. Genauso gibt es verschiedenste Möglichkeiten neue Texte zu annotieren, sowohl manuell als auch automatisch. Es gibt Systeme, die diese Annotationen in andere, visuell ansprechendere Medien umwandeln. Zu diesen Systemen gehören auch die Text2Scene Systeme, dort wird ein annotierter Text in eine dreidimensionale Szene umgewandelt. Ein Teil dieser Text2Scene Systeme können auch Personen durch Modelle von Menschen darstellen, aber bis jetzt gibt es noch kein System, dass Avatar Modelle selber synthetisieren kann. Der Fokus dieser Arbeit liegt sowohl darauf eine Schnittstelle bereitzustellen, mit der Avatare mit bestimmten Parametern erstellt werden können, als auch die Möglichkeit diese Avatare in der virtuellen Realität anzuzeigen und zu bearbeiten. Man kann in einer virtuellen Szene die Eigenschaften bestimmter Körperteile anpassen und die Kleidung der Avatare auswählen.

The p-adic section conjecture for localisations of curves (2020)

Lüdtke, Martin

The $p$-adic section conjecture predicts that for a smooth, proper, hyperbolic curve $X$ over a $p$-adic field $k$, every section of the map of étale fundamental groups $\pi_1(X) \to G_k$ is induced by a unique $k$-rational point of $X$. While this conjecture is still open, the birational variant in which $X$ is replaced by its generic point is known due to Koenigsmann. Generalising an alternative proof of Pop, we extend this result to certain localisations of $X$ at a set of closed points $S$, an intermediate version in between the full section conjecture and its birational variant. As one application, we prove the section conjecture for $X_S$ whenever $S$ is a countable set of closed points.

MobileAnnotator : eine App für den TextAnnotator (2020)

Adeberg, Pascal

Der Inhalt dieser Arbeit ist die Entwicklung und Evaluation einer mobilen Webanwendung für die Annotation von Texten. Dem Benutzer ist es durch diese Webanwendung, im folgenden auch MobileAnnotator genannt, möglich Wörter und Textausschnitte zu kategorisieren oder auch mit Wissensquellen, zum Beispiel Wikipedia, zu verknüpfen. Der MobileAnnotator ist dabei für mobile Endgeräte ausgelegt und insbesondere für Smartphones optimiert worden. Für die Funktionalität verwendet der MobileAnnotator die Architektur des bereits existierenden und etablierten TextAnnotators. Dieser stellt bereits eine Vielzahl von Annotations Werkzeugen bereit, von denen zwei auf den MobileAnnotator übertragen wurden. Da der TextAnnotator vollständig für einen Desktopbetrieb ausgelegt wurde, ist es jedoch nicht möglich diese Werkzeuge ohne Anpassungen für ein mobiles Gerät umzubauen. Der MobileAnnotator beschränkt sich somit auf ein Mindestmaß an Funktionen dieser Werkzeuge um sie dem Benutzer in geeigneter Art und Weise verfügbar zu machen. Für die Evaluation der Benutzerfreundlichkeit des MobileAnnotator und dessen Werkzeuge wurde anschließend eine Studie durchgeführt. Den Probanten war es innerhalb der Studie möglich Aussagen über die Bedienbarkeit des MobileAnnotators zu treffen und einen Vergleich zwischen dem Mobile- und TextAnnotator zu ziehen.

Design, development, and experimental assessment of a highly-reliable, radiation-tolerant readout system for the upgrade of the ALICE inner tracker (2020)

Lupi, Matteo

A Large Ion Collider Experiment (ALICE) is one of the four large experiments at the Large Hadron Collider (LHC) at the European Organization for Particle Physics (CERN). ALICE focuses on the physics of the strong interaction and in particular on the Quark-Gluon Plasma. This is a state of matter in which quarks are de-confined. It is believed that it existed in the earliest moments of the evolution of the universe. The ALICE detector studies the products of the collisions between heavy-nuclei, between protons, and between protons and heavy-nuclei. The sub-detector closest to the interaction point is the Inner Tracking System (ITS), which is used to measure the momentum and trajectory of the particles generated by the collisions and allows reconstructing primary and secondary interaction vertices. The ITS needs to have an accurate spatial resolution, together with a low material budget to limit the effect of multiple scattering on low-energetic particles to precisely reconstruct their trajectory. During the Long Shutdown 2 (2019-2020) of the LHC, the current ITS will be replaced by a completely redesigned sub-detector, which will improve readout rate and particle tracking performance especially at low-momentum. The ALice PIxel DEtector (ALPIDE) chip was designed to meet the requirements of the upgraded ITS in terms of resolution, material budget, radiation hardness, and readout rate. The ALPIDE chip is a Monolithic Active Pixel Sensor (MAPS) realised in Complementary Metal-Oxide Semiconductor (CMOS) technology. Sensing element, analogue front-end, and its digital readout are integrated into the same silicon die. The readout architecture of the new ITS foresees that data is transmitted via a high-speed serial link directly from the ALPIDE to the off-detector electronics. The data is transmitted off-chip by a so-called Data Transmission Unit (DTU) which needs to be tolerant to Single-Event Effects induced by radiation, in order to guarantee reliable operation. The ALPIDE chip will operate in a radiation field with a High-Energy Hadron peak flux of 7.7·10^5 cm^-2s^-1. The data are sent by the ALPIDE on copper cables to the readout system, which aggregates them and re-transmits them via optical fibres to the counting room. The position where the readout electronics will be placed is constrained by the maximum transmission distance reasonably achievable by the ALPIDE Data Transmission Unit and mechanical constraints of the ALICE experiment. The radiation field at that location is not negligible for its effects on electronics: the high-energy hadrons flux can reach 10^3 cm^-2s^-1. Static RAM (SRAM)-based Field Programmable Gate Arrays (FPGAs) are favoured over Application Specific Integrated Circuits (ASICs) or Radiation Hard by Design (RHBD) commercial devices because of cost effectiveness. Moreover, SRAM-based FPGAs are re-configurable and provide the data throughput required by the ITS. The main issue with SRAM-based FPGAs, for the intended application, is the susceptibility of their Configuration RAM (CRAM) to Single-Event Upsets: the number of CRAM bits is indeed much higher than the logic they configure. Total Ionizing Dose (TID) at the readout designed position is indeed still acceptable for Component Off The Shelf (COTS), provided that proper verification is carried out. This dissertation focuses on two parts of the design of the readout system: the Data Transmission Unit of the ALPIDE chip and the design of fundamental modules for the SRAM-based FPGA of the readout electronics. In the first part, a module of the Data Transmission Unit is designed, optimising the trade-off between power consumption, radiation tolerance, and jitter performance. The design was tested and thoroughly characterised, including tests while under irradiation with a 30 MeV protons. Furthermore the Data Transmission Unit performance was validated after the integration into the first prototypes of ITS modules. In the second part, the problem of developing a radiation-tolerant SRAM-based FPGA design is investigated and a solution is provided. First, a general methodology for designing radiation-tolerant Finite State Machines in SRAM-based FPGAs is analysed, implemented, and verified. Later, the radiation-tolerant FPGA design for the ITS readout is described together with the radiation effects mitigation techniques that were selectively applied to the different modules. The design was tested with multiple irradiation tests and the results are stated below.

Distributed annotation in virtual reality (2020)

Lööck, Simon

The main goal of this work was to create a network environment for the Unity Engine project StolperwegeVR, developed by the Text Technology Lab of Goethe University, in which you will be able to annotate one to several documents in a group. For this, basic network utils like seeing other users or moving objects had to be implemented which had to be easy to use and work with in the future.

Space optimizations in deterministic and concurrent call-by-need functional programming languages (2020)

Dallmeyer, Nils

In this thesis the space consumption and runtime of lazy-evaluating functional programming languages are analyzed. The typed and extended lambda-calculi LRP and CHF* as core languages for Haskell and Concurrent Haskell are used. For each LRP and CHF* compatible abstract machines are introduced. Too lower the distortion of space measurement a classical implementable garbage collector is applied after each LRP reduction step. Die size of expressions and the space measure spmax as maximal size of all garbage-free expressions during an LRP-evaluation, are defined. Program-Transformations are considered as code-to-code transformations. The notions Space Improvement and Space Equivalence as properties of transformations are defined. A Space Improvement does neither change the semantics nor it increases the needed space consumption, for a space equivalence the space consumption is required to remain the same. Several transformations are shown as Space Improvements and Equivalences. An abstract machine for space measurements is introduced. An implementation of this machine is used for more complex space- and runtime-analyses. Total Garbage Collection replaces subexpressions by a non-terminating constant with size zero, if the overall termination is not affected. Thereby the notion of improvement is more independent from the used garbage collector. Analogous to Space Improvements and Equivalences the notions Total Space Improvement and Total Space Equivalence are defined, which use Total Garbage Collection during the space measurement. Several Total Space Improvements and Equivalences are shown. Space measures for CHF* are defined, that are compatible to the space measure of LRP. An algorithm with sort-complexity is developed, that calculates the required space of independent processes that all start and end together. If a constant amount of synchronization restrictions is added and a constant number of processors is used, the runtime is polynomial, if arbitrary synchronizations are used, then the problem is NP-complete. Abstract machines for space- and time-analyses in CHF* are developed and implementations of these are used for space and runtime analyses.

Multi-document analysis : semantic analysis of large text corpora beyond topic modeling (2019)

Uslu, Tolga

Viele Methoden wurden in dieser Arbeit vorgestellt, die sich mit dem Hauptziel der automatischen Dokumentenanalyse auf semantischer Ebene befassen. Um das Hauptziel zu erreichen, mussten wir jedoch zunächst eine solide Basis entwickeln, um das Gesamtbild zu vervollständigen. So wurden verschiedene Methoden und Werkzeuge entwickelt, die verschiedene Aspekte des NLP abdecken. Das Zusammenspiel dieser Methoden ermöglichte es, unser Ziel erfolgreich zu erreichen. Neben der automatischen Dokumentenanalyse legen wir großen Wert auf die drei Prinzipien von Effizienz, Anwendbarkeit und Sprachunabhängigkeit. Dadurch waren die entwickelten Tools für die Anwendungen bereit. Die Größe und Sprache der zu analysierenden Daten ist kein Hindernis mehr, zumindest für die im Bezug auf die von Wikipedia unterstützten Sprachen. Einen großen Beitrag dazu leistete TextImager, das Framework, dass für die zugrunde liegende Architektur verschiedener Methoden und die gesamte Vorverarbeitung der Texte verantwortlich ist. TextImager ist als Multi-Server und Multi-Instanz-Cluster konzipiert, sodass eine verteilte Verarbeitung von Daten ermöglicht wird. Hierfür werden Cluster-Management-Dienste UIMA-AS und UIMA-DUCC verwendet. Darüber hinaus ermöglicht die Multi-Service-Architektur von TextImager die Integration beliebiger NLP-Tools und deren gemeinsame Ausführung. Zudem bietet der TextImager eine webbasierte Benutzeroberfläche, die eine Reihe von interaktiven Visualisierungen bietet, die die Ergebnisse der Textanalyse darstellen. Das Webinterface erfordert keine Programmierkenntnisse - durch einfaches Auswählen der NLP-Komponenten und der Eingabe des Textes wird die Analyse gestartet und anschließend visualisiert, so dass auch Nicht-Informatiker mit diesen Tools arbeiten können. Zudem haben wir die Integration des statistischen Frameworks R in die Funktionalität und Architektur von TextImager demonstriert. Hier haben wir die OpenCPU-API verwendet, um R-Pakete auf unserem eigenen R-Server bereitzustellen. Dies ermöglichte die Kombination von R-Paketen mit den modernsten NLP-Komponenten des TextImager. So erhielten die Funktionen der R-Pakete extrahierte Informationen aus dem TextImager, was zu verbesserten Analysen führte. Darüber hinaus haben wir interaktive Visualisierungen integriert, um die von R abgeleiteten Informationen zu visualisieren. Einige der im TextImager entwickelten Visualisierungen sind besonders herausragend und haben in vielen Bereichen Anwendung gefunden. Ein Beispiel dafür ist PolyViz, ein interaktives Visualisierungssystem, das die Darstellung eines multipartiten Graphen ermöglicht. Wir haben PolyViz anhand von zwei verschiedenen Anwendungsfällen veranschaulicht. SemioGraph, eine Visualisierungstechnik zur Darstellung multikodaler Graphen wurde auch vorgestellt. Die visuellen und interaktiven Funktionen von SemioGraph wurden mit einer Anwendung zur Visualisierung von Worteinbettungen vorgestellt. Wir haben gezeigt, dass verschiedene Modelle zu völlig unterschiedlichen Grafiken führen können. So kann Semiograph bei der Suche nach Worteinbettungen für bestimmte NLP-Aufgaben helfen. Inspiriert von all den Textvisualisierungen im TextImager ist die Idee für text2voronoi geboren. Hier stellten wir einen neuartigen Ansatz zur bildgetriebenen Textklassifizierung vor, der auf einem Voronoi-Diagram linguistischer Merkmale basiert. Dieser Klassifikationsansatz wurde auf die automatische Patientendiagnose angewendet und wir haben gezeigt, dass wir das traditionelle Bag-Of-Words-Modell sogar übertreffen. Dieser Ansatz ermöglicht es, die zugrunde liegenden Merkmale anschließend zu analysieren und damit einen ersten Schritt zur Lösung der Black Box zu machen. Wir haben text2voronoi auf literarische Werke angewendet und die entstandenen Visualisierungen auf einer webbasierten Oberfläche (LitViz) präsentiert. Hier ermöglichen wir den Vergleich von Voronoi-Diagrammen der verschiedenen Literaturen und damit den visuellen Vergleich der Sprachstile der zugrunde liegenden Autoren. Mit unserer Kompetenz in der Vorverarbeitung und der Analyse von Texten sind wir unserem Ziel der semantischen Dokumentenanalyse einen Schritt näher gekommen. Als nächstes haben wir die Auflösung der Sinne auf der Wortebene untersucht. Hier stellten wir fastSense vor, ein Disambigierungsframework, das mit großen Datenmengen zurecht kommt. Um dies zu erreichen, haben wir einen Disambiguierungskorpus erstellt, der auf Wikipedias 221965 Disambiguierungsseiten basiert, wobei die sich auf 825179 Sinne beziehen. Daraus resultierten mehr als 50 Millionen Datensätze, die fast 50 GB Speicherplatz benötigten. Wir haben nicht nur gezeigt, dass fastSense eine so große Datenmenge problemlos verarbeiten kann, sondern auch, dass wir mit unseren Wettbewerbern mithalten und sie bei einigen NLP-Aufgaben sogar übertreffen können. Jetzt, da wir den Wörtern Sinne zuordnen können, sind wir der semantischen Dokumentenanalyse einen weiteren Schritt näher gekommen. Je mehr Informationen wir aus einem Text und seinen Wörtern gewinnen können, desto genauer können wir seinen Inhalt analysieren. Wir stellten zudem einen netzwerktheoretischen Ansatz zur Modellierung der Semantik großer Textnetzwerke am Beispiel der deutschen Wikipedia vor. Zu diesem Zweck haben wir einen Algorithmus namens text2ddc entwickelt, um die thematische Struktur eines Textes zu modellieren. Dabei basiert das Modell auf einem etablierten Klassifikationsschema, nämlich der Dewey Decimal Classification. Mit diesem Modell haben wir gezeigt, wie man aus der Vogelperspektive die Hervorhebung und Verknüpfung von Themen, die sich in Millionen von Dokumenten manifestiert, darstellt. So haben wir eine Möglichkeit geschaffen, die thematische Dynamik von Dokumentnetzwerken automatisch zu visualisieren. Die Trainings- und Testdaten, die wir in diesem Kapitel hatten, bestanden jedoch hauptsächlich aus kurzen Textausschnitten. Zudem haben wir DDC Korpora erstellt, indem wir Informationen aus Wikidata, Wikipedia und der von der Deutschen Nationalbibliothek verwalteten Gemeinsamen Normdatei (GND) vereinigt haben. Auf diese Weise konnten wir nicht nur die Datenmenge erhöhen, sondern auch Datensätze für viele bisher unzugängliche Sprachen erstellen. Wir haben text2ddc so weit optimiert, dass wir einen F-score von 87.4% erzielen für die 98 Klassen der zweiten DDC-Stufe. Die Vorverarbeitung von TextImager und die Disambiguierung durch fastSense hatten einen großen Einfluss darauf. Für jedes Textstück berechnet text2ddc eine Wahrscheinlichkeitsverteilung über die DDC-Klassen berechnen Der klassifikatorinduzierte semantische Raum von text2ddc wurde auch zur Verbesserung weiterer NLP-Methoden genutzt. Dazu gehört auch text2wiki, ein Framework für automatisches Tagging nach dem Wikipedia-Kategoriensystem. Auch hier haben wir einen klassifikatorinduzierten semantischen Raum, aber diesmal basiert er auf dem Wikipedia-Kategoriensystem. Ein großer Vorteil dieses Modells ist die Präzision und Tiefe der behandelten Themen und das sich ständig weiterentwickelnde Kategoriesystem. Damit sind auch die Kriterien eines offenen Themenmodells erfüllt. Um die Vorteile von text2wiki zu demonstrieren, haben wir anschließend die von text2wiki bereitgestellten Themenvektoren verwendet, um text2ddc zu verbessern, so dass sich beide Systeme gegenseitig verbessern können. Die Synergie zwischen den erstellten Methoden in dieser Dissertation war entscheidend für den Erfolg jeder einzelnen Methode.

Automatic Topic Modeling in the Context of Digital Libraries: Mehrsprachige Korpus-basierte Erweiterung von text2ddc - eine experimentelle Studie (2020)

Baumartz, Daniel

Diese Bachelorarbeit befasst sich mit der Themenklassifikation von unstrukturiertem Text. Aufgrund der stetig steigenden Menge von textbasierten Daten werden automatisierte Klassifikationsmethoden in vielen Disziplinen benötigt und erforscht. Aufbauend auf dem text2ddc-Klassifikator, der am Text Technology Lab der Goethe-Universität Frankfurt am Main entwickelt wurde, werden die Auswirkungen der Vergrößerung des Trainingskorpus mittels unterschiedlicher Methoden untersucht. text2ddc nutzt die Dewey Decimal Classification (DDC) als Zielklassifikation und wird trainiert auf Artikeln der Wikipedia. Nach einer Einführung, in der Grundlagen beschrieben werden, wird das Klassifikationsmodell von text2ddc vorgestellt, sowie die Probleme und daraus resultierenden Aufgaben betrachtet. Danach wird die Aktualisierung der bisherigen Daten beschrieben, gefolgt von der Vorstellung der verschiedenen Methoden, das Trainingskorpus zu erweitern. Mit insgesamt elf Sprachen wird experimentiert. Die Evaluation zeigt abschließend die Verbesserungen der Qualität der Klassifikation mit text2ddc auf, diskutiert die problematischen Fälle und gibt Anregungen für weitere zukünftige Arbeiten.

Ordnung des Fachbereichs Informatik und Mathematik der Johann Wolfgang Goethe-Universität Frankfurt am Main für den Masterstudiengang Mathematik mit dem Abschluss "Master of Science (M. Sc.)" vom 25. Mai 2020 : genehmigt vom Präsidium am 30. Juni 2020 (2020)

Aufgrund der §§20, 44 Abs. 1 Nr. 1 des Hessischen Hochschulgesetzes in der Fassung vom 14. Dezember 2009 (GVBl. I, S. 666), zuletzt geändert durch Art. 2 des Gesetzes vom 18. Dezember 2017 (GVBl. I, S. 284), hat der Fachbereichsrat des Fachbereichs Informatik und Mathematik der Johann Wolfgang Goethe-Universität Frankfurt am Main am 25. Mai 2020 die folgende Ordnung für den Bachelorstudiengang Mathematik beschlossen. Diese Ordnung hat das Präsidium der Goethe-Universität gemäß §37 Abs. 5 Hessisches Hochschulgesetz am 30. Juni 2020 genehmigt. Sie wird hiermit bekannt gemacht.

Ordnung des Fachbereichs Informatik und Mathematik der Johann Wolfgang Goethe-Universität Frankfurt am Main für den Bachelorstudiengang Mathematik mit dem Abschluss "Bachelor of Science (B. Sc.)" vom 25. Mai 2020 : genehmigt vom Präsidium am 30. Juni 2020 (2020)

101 to 150

Open Access

Institutes

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

204 search hits