Refine
Document Type
- Conference Proceeding (1)
- Doctoral Thesis (1)
Language
- English (2)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2)
Keywords
- Connected Components (1)
- Experimental Evaluation (1)
- External Memory (1)
- Graph Algorithms (1)
- I/O efficiency (1)
- Randomization (1)
- algorithms (1)
- data structures (1)
- external memory (1)
Institute
- Informatik (1)
- Informatik und Mathematik (1)
This thesis presents research which spans three conference papers and one manuscript which has not yet been submitted for peer review.
The topic of 1 is the inherent complexity of maintaining perfect height in B-trees. We consider the setting in which a B-tree of optimal height contains n = (1−ϵ)N elements where N is the number of elements in full B-tree of the same height (the capacity of the tree). We show that the rebalancing cost when updating the tree—while maintaining optimal height—depends on ϵ. Specifically, our analysis gives a lower bound for the rebalancing cost of Ω(1/(ϵB)). We then describe a rebalancing algorithm which has an amortized rebalancing cost with an almost matching upper bound of O(1/(ϵB)⋅log²(min{1/ϵ,B})). We additionally describe a scheme utilizing this algorithm which, given a rebalancing budget f(n), maintains optimal height for decreasing ϵ until the cost exceeds the
budget at which time it maintains optimal height plus one. Given a rebalancing budget of Θ(logn), this scheme maintains optimal height for all but a vanishing fraction of sizes in the intervals between tree capacities.
Manuscript 2 presents empirical analysis of practical randomized external-memory algorithms for computing the connected components of graphs. The best known theoretical results for this problem are essentially all derived from results for minimum spanning tree algorithms. In the realm of randomized external-memory MST algorithms, the best asymptotic result has I/O-complexity O(sort(|E|)) in expectation while an empirically studied practical algorithm has a bound of O(sort(|E|)⋅log(|V|/M)). We implement and evaluate an algorithm for connected components with expected I/O-complexity O(sort(|E|))—a simplification of the MST
algorithm with this asymptotic cost, we show that this approach may also yield good results in practice.
In paper 3, we present a novel approach to simulating large-scale population protocol models. Naive simulation of N interactions of a population protocol with n agents and m states requires Θ(nlogm) bits of memory and Θ(N) time. For
very large n, this is prohibitive both in memory consumption and time, as interesting protocols will typically require N > n interactions for convergence. We describe a histogram-based simulation framework which requires Θ(mlogn) bits of memory instead—an improvement as it is typically the case that
n ≫ m. We analyze, implement, and compare a number of different data structures to perform correct agent sampling in this regime. For this purpose, we develop dynamic alias tables which allow sampling an interaction in expected amortized
constant time. We then show how to use sampling techniques to process agent interactions in batches, giving a simulation approach which uses subconstant time per interaction under reasonable assumptions.
With paper 4, we introduce the new model of fragile complexity for comparison-based algorithms. Within this model, we analyze classical comparison-based problems such as finding the minimum value of a set, selection (or finding the median), and sorting. We prove a number of lower and upper bounds and in particular, we give a number of randomized results which describe trade-offs not achievable by deterministic algorithms.
We empirically investigate algorithms for solving Connected Components in the external memory model. In particular, we study whether the randomized O(Sort(E)) algorithm by Karger, Klein, and Tarjan can be implemented to compete with practically promising and simpler algorithms having only slightly worse theoretical cost, namely Borůvka’s algorithm and the algorithm by Sibeyn and collaborators. For all algorithms, we develop and test a number of tuning options. Our experiments are executed on a large set of different graph classes including random graphs, grids, geometric graphs, and hyperbolic graphs. Among our findings are: The Sibeyn algorithm is a very strong contender due to its simplicity and due to an added degree of freedom in its internal workings when used in the Connected Components setting. With the right tunings, the Karger-Klein-Tarjan algorithm can be implemented to be competitive in many cases. Higher graph density seems to benefit Karger-Klein-Tarjan relative to Sibeyn. Borůvka’s algorithm is not competitive with the two others.