Some results on the theory and practice of algorithms

  • This thesis presents research which spans three conference papers and one manuscript which has not yet been submitted for peer review. The topic of 1 is the inherent complexity of maintaining perfect height in B-trees. We consider the setting in which a B-tree of optimal height contains n = (1−ϵ)N elements where N is the number of elements in full B-tree of the same height (the capacity of the tree). We show that the rebalancing cost when updating the tree—while maintaining optimal height—depends on ϵ. Specifically, our analysis gives a lower bound for the rebalancing cost of Ω(1/(ϵB)). We then describe a rebalancing algorithm which has an amortized rebalancing cost with an almost matching upper bound of O(1/(ϵB)⋅log²(min{1/ϵ,B})). We additionally describe a scheme utilizing this algorithm which, given a rebalancing budget f(n), maintains optimal height for decreasing ϵ until the cost exceeds the budget at which time it maintains optimal height plus one. Given a rebalancing budget of Θ(logn), this scheme maintains optimal height for all but a vanishing fraction of sizes in the intervals between tree capacities. Manuscript 2 presents empirical analysis of practical randomized external-memory algorithms for computing the connected components of graphs. The best known theoretical results for this problem are essentially all derived from results for minimum spanning tree algorithms. In the realm of randomized external-memory MST algorithms, the best asymptotic result has I/O-complexity O(sort(|E|)) in expectation while an empirically studied practical algorithm has a bound of O(sort(|E|)⋅log(|V|/M)). We implement and evaluate an algorithm for connected components with expected I/O-complexity O(sort(|E|))—a simplification of the MST algorithm with this asymptotic cost, we show that this approach may also yield good results in practice. In paper 3, we present a novel approach to simulating large-scale population protocol models. Naive simulation of N interactions of a population protocol with n agents and m states requires Θ(nlogm) bits of memory and Θ(N) time. For very large n, this is prohibitive both in memory consumption and time, as interesting protocols will typically require N > n interactions for convergence. We describe a histogram-based simulation framework which requires Θ(mlogn) bits of memory instead—an improvement as it is typically the case that n ≫ m. We analyze, implement, and compare a number of different data structures to perform correct agent sampling in this regime. For this purpose, we develop dynamic alias tables which allow sampling an interaction in expected amortized constant time. We then show how to use sampling techniques to process agent interactions in batches, giving a simulation approach which uses subconstant time per interaction under reasonable assumptions. With paper 4, we introduce the new model of fragile complexity for comparison-based algorithms. Within this model, we analyze classical comparison-based problems such as finding the minimum value of a set, selection (or finding the median), and sorting. We prove a number of lower and upper bounds and in particular, we give a number of randomized results which describe trade-offs not achievable by deterministic algorithms.

Download full text files

Export metadata

Author:David Hammer
Referee:Ulrich Meyer
Document Type:Doctoral Thesis
Date of Publication (online):2021/04/10
Year of first Publication:2020
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Granting Institution:Johann Wolfgang Goethe-Universität
Date of final exam:2020/11/27
Release Date:2021/10/11
Tag:I/O efficiency; algorithms; data structures; external memory
Page Number:138
The third chapter comprises a manuscript for a conference paper which was at the time the thesis was submitted not yet published at a scientific venue. After submission (and subsequent defense) of the thesis, the work detailed in this chapter was expanded and the final paper was published at SEA 2021. This version can be found at
Institutes:Informatik und Mathematik
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):License LogoDeutsches Urheberrecht