There are many different options for doing your Master’s thesis project in the Algorithms Group. Most projects will follow the main focus of the group, which is algorithms and data structures for spatial data. We are interested in both fundamental problems concerning spatial data, as well as in algorithmic problems arising in application areas such as Information Systems (GIS), and Computer-Aided Design and Manufacturing (CAD/CAM). We are also interested in optimization problems. Thesis projects in our group range from theoretical to experimental, and they can either be internal or with one of our industrial contacts such as Philips or TNO. For a few more concrete examples, see Herman Haverkort's list of possible graduate topics. Furthermore you can have a look at the list below, containing master's theses that were supervised by people from our group.

If you are interested in doing a project with us, please contact Mark de Berg.

Supervisor(s): Kevin Buchin and A.D. Volkers

Repetition is a fundamental concept in music. This fact still applies between derivative works of music. Previous research has shown that folk songs display high similarity in repeating patterns across various versions of a song. For human experts, these repeating patterns are distinguishable, but tedious to discover by hand. Therefore, Pattern Discovery algorithms have been developed to perform this task. Most of these algorithms, however, are not suited for the intricacies of dealing with various works at a time. The algorithms also lack the capability of working with all the differences between what humans would consider similar patterns. We perform a qualitative analysis of the weaknesses in various pattern discovery algorithms with respect to groups of songs. We relate these weaknesses directly to what we find to be common properties in patterns discovered by a human analyst. These properties and weaknesses are then broken down into a set of requirements, which we try to meet with our own algorithmic approach. Our solution comes in the form of a pipeline. This pipeline consists of four tasks, each with responsibility for a subset of the established requirements. The first task transforms the set of songs, such that they retain their information, but are similar in the time- and pitch-domains. The second task is a typical pattern discovery algorithm. The third task combines the discovered patterns into clusters. Finally, the fourth task selects a coherent subset of the clustered patterns. We propose a set of solutions for each of these tasks. We demonstrate our algorithmic solution on the same families of folk songs from our initial quantitative analysis. We then evaluate it in terms of the requirements derived from that analysis. We conclude that our pipeline results in a manageable set of patterns, which display properties typically only handled well by human experts. These properties include rhythmic variation between pattern occurrences, embellishments, and positional context. Next to our algorithmic solution, we developed a tool for visualizing patterns in songs. We used this tool for the qualitative analysis, both of previous pattern discovery algorithms and our own. This tool has also been evaluated by an expert user study.

Supervisor(s): Kevin Buchin and Wouter Meulemans

With the recent rise of sensor-sourced positioning data, the need to derive meaningful conclusions from trajectory data increases. A prominent trajectory comparison metric, the Fréchet distance, is often used but is inefficent to compute for large input sets. This thesis attempts to improve the efficiency of Fréchet related computations for representative problems, with representative data. This thesis presents novel efficiency-focused adaptations to existing Fréchet algorithms and their composition into an efficient pruning-based query answering algorithm. The investigated approaches can be integrated piecemeal into other computation systems, and the thesis examines when and how which approach is most effective. The algorithms proposed in this thesis have been shown to be successful in practice, reaching second place in the ACM sigspatial GIScup of 2017.

Supervisor(s): M.T. de Berg and Joachim Gudmundsson

We study the problem of connecting a new vertex p to a Euclidean graph G using k new edges,, called feed links which connect p to the boundary of the plane face f of G containing p. We aim to connect the vertex in such a way that the maximum dilation from p to a set of destination vertices in the graph is minimised, where we define dilation as the ratio between the distance in the graph and Euclidean distance. Previous work mainly focus on reducing the dilation to points on the boundary of f rather than points anywhere in G. We give two different methods to solve this problem in polynomial time for a fixed k. We show that the problem is NP-hard when k is not fixed by a reduction from set cover. We give three different approximation algorithms that place k feed links such that the maximal dilation is approximated. The first algorithm gives a (1 +ε)-approximation for arbitrarily small ε, however, its running time is still exponential in k. The second gives a (t + 1)-approximation when f is a t-spanner and runs in O(mn^2 log n) time, where m denotes the number of vertices on the boundary of f and n denotes the total number of vertices in the graph. The third approximation algorithm also has a running time of O(mn^2 log n) and is obtained after a comparison between dilation and distance in the context of feed link placement. Its approximation ratio depends on the difference in Euclidean distances between p and the destination vertices.

Supervisor(s): K.A. Buchin

Simplifying polygonal curves at different levels of detail is an important problem with many applications. To facilitate exploration of these simplifications, we wish to seamlessly switch between different levels of detail. These simplifications must thus be consistent with one another. We call such a set of inter-consistent simplifications a progressive simplification. Existing geometric optimization algorithms used for curve simplification minimize the complexity of the simplified curve for each level of detail independently. These algorithms therefore do not produce progressive simplifications. All currently known progressive simplification algorithms are based on heuristics and therefore do not strictly minimize the simplification complexity. In this thesis, we wish to bridge the gap between these two types of algorithms. More specifically, we wish to produce progressive simplifications for which the sum or integral of the simplification complexity over all levels of detail is minimized. We present an algorithm that solves this problem in O(n^3*m) time, where n is the length of the input curve and m the number of different levels of detail. This algorithm is compatible with any distance measure such as Hausdorff or Frechet, and can be used to compute an optimal simplification for continuous scaling in O(n^5) time. We further explore two greedy algorithms with a running time of O(n^2*m). These algorithms greedily construct each simplification, and may therefore yield a set of simplifications for which the cumulative complexity is non-minimal.

Supervisor(s): Kevin Buchin, Dragan Kostić, and Johan Smeets

Autonomous mobile robot research is a broad field and new technologies are continuously being developed. Practical evaluation of novel algorithms and techniques tends to be expensive and involved due to the hardware, so research frequently resorts to simulation. This work presents the design of a low-cost robotics platform for the purpose of evaluating cooperative motion control algorithms. Much focus is put on sensor performance to reduce localization error, as this is a leading metric for algorithmic performance. This is aided by specific optimizations that take advantage of full-system integration, which is not possible for partial designs. A prototype of the hardware is built, and a partial software implementation addresses many of the integration problems. The platform uses modern technology such as Ultra-wideband radio (UWB), Lidar and computer vision, at a unit cost of around EUR500. Environmental awareness is shared among collaborating robots, by data structures that are well-suited for distributed updates. One use case is collaborative Coverage Path Planning under the constraints of position error. For this purpose, optimized strategies for trajectory tracking, online navigation, contour-based coverage path planning and formation control are proposed. And finally, the same software framework is also usable for simulations, and therefore suits many types of comparative analysis.

Supervisor(s): Kevin Buchin

*Second prize in Ngi-NGN Informatie Scriptieprijs voor Informatica en Informatiekunde.*

We propose a new approach for constructing the underlying map from trajectory data. This algorithm is based on the idea that road segments can be identified as stable subtrajectory clusters in the data. For this, we consider how subtrajectory clusters evolve for varying distance values, and choose stable values for these, in this way avoiding a global proximity parameter. Within trajectory clusters, we choose representatives, which are combined to form the map. We experimentally evaluate our algorithm on hiking and vehicle tracking data. These experiments demonstrate that our approach can naturally deal with different road widths, and differences in density of the data. It can also to an extent separate roads that run close to each other and can deal with outliers in the data, two issues that are notoriously difficult in road network reconstruction.

Supervisor(s): H.J. Haverkort

In the field of numerical simulation, a multi-scale grid and its traversal can have a huge impact on the cache-efficiency of the calculations performed. We present the Haverkort element traversal and refinement scheme which is based on the bisection of 8-reptile tetrahedra. We developed a stack-assignment scheme that allows the Haverkort traversal to use stacks for storing the input data, the output data, and the temporary data of vertices and elements. Our parallel plane approach to assign stacks to vertices gave solution that requires at most 8 stacks to store temporary vertex data for uniform and multi-scale refined grids independent of the refinement level. Furthermore 8 is also the lower bound for the number of stacks for our parallel plane approach. Additionally we show that a general lower bound exists of 5 temporary stacks for storing the vertex data during a traversal.

The combination of the Haverkort traversal with our Constant Stack algorithm is cache-oblivious, suitable for multi-level cache hierarchies and maintains its performance for multi-level refined grids and allows for fast and efficient space-filling-curve-based (re)partitioning. The Constant Stack algorithm has a time complexity of O(1) for stackassignment and -access and can achieve a cache-miss ratio of less than 5.2%. For all of these reasons we expect that a constant-number-of-stacks solution can compete with and outperform numerical simulations using cache-optimization techniques based on loop blocking when running on CPUs or GPUs. We apply the approach found for obtaining a constant-number-of-stacks solution for the Haverkort element traversal and a refinement scheme was applied to two other suitable 8-reptile polyhedra. Traversal and refinement schemes are given for an 8-reptile bisected cube and an 8-reptile bisected triangular prism needing 9 and 7 stacks respectively.

Supervisor(s): Mark de Berg and Dirk Gerrits (Kulicke & Soffa)

Kulicke & Soffa applies Surface Mount Technology to manufacture printed circuit boards (PCBs). They use two different types of machines to outfit these PCBs with components. We study the problem of distributing the workload among these different types of machines in order to minimize overall cycle time. Our approach consists of a simulated annealing algorithm, accompanied by a model of the machines used to estimate the cycle time of a particular solution. Various variants of simulated annealing and different ways to incorporate all operations of the machines in the model are discussed. The best variant of our simulated annealing algorithm yields an average improvement of 7.99% compared to the current system used to balance the workload. Taking the best result encountered in our experiments, we obtain an average improvement of 9.99%. We suggest improvements for future work that we believe to further improve the performance of our algorithms.

Supervisor(s): Mark de Berg and Kevin Buchin

In a range-assignment problem one is given a set of nodes (points) in the plane, and the goal is to assign a communication range to each node such that the resulting communication graph has some desired properties, while the total energy consumption is small. In this thesis project we consider the power assignment problem for moving nodes, where the desired property is that the communication graph contains a broadcast tree from a given source node. We develop a new algorithm for this problem, and show experimentally that the resulting power consumption is smaller than for existing algorithms.

Supervisor(s): B.M.P. Jansen

In general, a kernelization algorithm is an efficient preprocessing procedure that reduces the size of the input to a difficult computational problem, without changing the answer. For optimization problems, we know how much the size of an optimum solution changes by reducing the size of the input using a kernelization algorithm. Garnero et al. [16] recently proposed a new kind of kernelization algorithm for optimization problems on planar graphs: The algorithm reduces a subgraph H of planar graph G, to a different subgraph H0 called a representative. Subgraphs H and H0 are connected to the remainder of G by t vertices. This reduction is done for multiple of such subgraphs in G, that are also connected to the remainder of G by t vertices.

The size of an optimum solution after reducing H to H0, can be inferred by only looking at subgraph H and the representative H0: We say that subgraph H is equivalent to representative H0 if there is a constant c, such that the optimum solution of every graph G, that has H as a subgraph, changes by exactly c, by replacing H by H0. Therefore, the proposed kernelization algorithm reduces a subgraph H of G to an equivalent representative H0.

For the kernelization algorithm to be fast, one should be able to efficiently find a representative H0 that is equivalent to a subgraph H in G. This is possible if subgraph H has bounded treewidth or bounded pathwidth.

Let Rt be a set of these subgraphs called representatives. Garnero et al. showed that an upper bound on the size of representatives in Rt is doubly-exponentially dependent on t, the number of vertices with which these subgraphs are connected to the remainder of a graph. We propose lower bounds for the size of these representatives, also dependent on t.

We give a lower bound of Ω((2^t)/√(4t)) on the number of vertices of a representative in such a set Rt for INDEPENDENT SET. This bound holds for sets of planar representatives with bounded treewidth/pathwidth. We also show that the equivalence relation that we explained before has at least 2^((2^t)/√(4t)) equivalence classes for INDEPENDENT SET on general graphs. Furthermore, we improve on the results of Garnero et al. by giving an upper bound of 2^(2^t−1) on the number of equivalence classes for INDEPENDENT SET on general graphs. These bounds even hold for the number of equivalence classes for planar subgraphs of bounded treewidth/pathwidth.

Supervisor(s): Kevin Buchin

We investigate the number of simple cycles, Hamiltonian cycles and perfect matchings in planar graphs with n vertices.

By analyzing the way simple paths can be extended several steps at a time using a face-coloring technique, we show that there can be at most O(2.870214^n) simple cycles in planar graphs. We look into a result claimed in a previous work that there are at most O(2.2134^n) Hamiltonian cycles in planar graphs, and argue that the proof given there is likely flawed. Using the transfer matrix technique, we show that there is a family of planar graphs with O(1.535365^n) perfect matchings.

Supervisor(s): H.J. Haverkort and M.A. Westenberg

A metro map is a type of diagram in which shapes of lines are simplified and geographic distances are often heavily distorted in order to provide a clear overview of a complex network. Because of these distortions, information about the original (real-world, geographically correct) map is lost. As a result, travelers could take unnecessary long routes when traveling from A to B. In order to tackle this problem we introduce several visual cues that aid the user in finding the fastest route from A to B.

Supervisor(s): H.J. Haverkort and M.A. Westenberg

Schematized maps are often used in order to simplify transit map networks. However, in order to do so the length of different routes often changes. This makes it hard to see which route would be the fastest route between a given source and destination. In order to remedy this problem and help users using the map, we divide the map into zones of approximately equal diameter (travel time-wise). Now a route that is shorter can be found as the route that traverses less of these zones. We present an automated approach to find and visualize such zones for schematized maps.

Supervisor(s): P. Bose, M.T. de Berg, and H.M.M. van de Wetering

The topic of this thesis is geometric spanner networks. Geometric spanners are networks defined by a set of points P and a set of straight line segments E, called edges, that connect the points in P. The aim of spanners is to balance the total number of edges in the network against the largest detour one has to make when taking the shortest path of edges from an arbitrary point in P to any other. This largest detour defines the spanning ratio of the spanner. In this thesis we restrict our attention to the case where the points in P are located in the 2-dimensional plane.

Within the area of geometric spanners we address two subtopics. First we propose a new type of spanner and prove that it has spanning ratio 1.998, and consists of at most 8n − 6 edges. We furthermore show it has the desirable property that the shortest path between any two vertices consists of at most 53.2*log(n) edges. This spanner is based on the Delaunay triangulation, and is constructed by means of the hierarchical data structure introduced by Kirkpatrick. Using the same construction method, we show that it is possible to build a spanner with similar properties using any θk-graph as a base. In particular, this spanner has spanning ratio 1/(1−2 sin(π/k)), consists of kn edges, and has a short path between any two vertices of O(log n) edges.

The second part of this thesis focuses on a specific spanner called the θ5-graph. A special version of the θ5-graph is the one where we introduce a set S of line segments, called constraints. These constraints prevent us from drawing edges crossing them. Though (constrained) θk-graphs have been studied extensively, and the spanning ratio of most of these graphs have been (tightly) proved, the spanning ratio of the constrained θ5-graph remains undetermined so far. In this thesis we make an attempt to bound it by generalizing the approach Bose et al. used to bound the spanning ratio of the θ5-graph in the unconstrained setting. Unfortunately it turns out that their approach cannot be generalized easily to the constrained setting. We nonetheless present the proof to the point where it breaks down, and discuss alternative strategies that were considered.

Supervisor(s): K.A. Buchin and Tobias Bruckmann

A cable-driven parallel robot consists of an end-effector connected to a frame by cables which can be wound up or down in order to move the end-effector. Most approaches to calculate the forces needed to move this end-effector aim primarily at minimizing these forces. However, changes in forces should also be low. By interpreting the vector of forces as points in Euclidean space, this results in a kinetic geometric optimization problem with a smoothness constraint on the path of the solution. In this thesis we first consider a general way to add smoothness constraints to kinetic geometry problems. We then focus on solving the force problem for cable-driven parallel robot using linear programming with smoothness constraints. To make use of the low dimension of the problem on the one hand and the kinetic nature on the other hand, we base our approach on Seidel’s algorithm and re-use the final order of the constraints from the previous instance. We test our approach on randomly generated trajectories and find that it performs well for seven to ten cables and six degrees of freedom.

Supervisor(s): M.T. de Berg

The visibility index of a cell c in a grid terrain T is defined as the percentage of cells that are visible from c. We consider the problem of computing the visibility index of all cells c in a grid terrain T of size n x n. To this end we first study the 1-dimensional version of the problem. We propose an algorithm that efficiently computes the visibility index in O(n (log n)^2) time for a given 1-dimensional terrain of size n. The algorithm is able to compute the visibility index of 500,000 cells within 150 seconds using real-life data sets, while the brute-force approach can only compute it for 5,000 cells within that time frame. Our proposed algorithm can be used to efficiently approximate the 2-dimensional version of the problem and we present our ideas on how to do that.

Supervisor(s): J. Gudmundsson, M.T. de Berg, and K.A. Buchin

Given a simple polygon P in the plane and a source point p in its interior, we aim to connect p to the boundary ∂P of the polygon using one or more edges called feed links, such that the detour from every point on ∂P to p is as small as possible. This detour is defined as the ratio between the shortest path in the resulting network, and the Euclidean distance. In doing so we look at a more general version of the problem of reducing the maximum detour presented in earlier papers, as well as a variation of this problem where we try to minimise the average detour.

We first prove that a feed link that solves one of these problems optimally might have arbitrary bad results for the other problem, and hence we need to look at these problems independently. We then present an algorithm that places one feed-link optimally such that the average detour is minimised in linear time. Next a variation of this algorithm is presented that reduces finding the optimal placement for k feed links such that the average detour is minimised to solving O(n^(2k)) systems of k equations in k unknowns.

Finally we look at approximation algorithms for both minimising the average and minimising the maximum detour. This results in two algorithms. The first places k feed links such that the average detour is a (1 + ε)-approximation of the optimal solution in time O(k(n^3)/ε). The second places k feed links such that the maximum detour is a (1+ε)-approximation of the optimal solution in time O(n/ε(k + n)).

Supervisor(s): B.M.P. Jansen

Many problems we care for are known to be NP-complete. Therefore, we do not think they can be solved in polynomial time, so only an exponential time algorithm is known. To speed up the search for a solution, we want to start by preprocessing an input instance. You can try to make the input instance smaller in polynomial time, without changing the answer. After doing so, we can run the slow algorithm on a smaller input. This thesis investigated whether there exist such algorithms that reduce the number of edges in graph problems, and the number of clauses for logical formulas. Such an algorithm is called a sparsification.

We will show that for a number of decision problems on graphs, polynomial-time algorithms cannot compress instances of such problems to equivalent instances, of a possibly different problem, whose encoding size is sub-quadratic in the number of vertices. Here we only want to maintain the solution (YES/NO), therefore P = NP would imply that any NP-complete problem can be compressed in polynomial time to a single bit, indicating whether it was a YES- or a NO-instance. For example look at the 4-colorability problem, which asks if we can color all vertices in a graph in such a way that the endpoints of any edge in the graph have distinct colors. Assuming that NP is not contained in coNP/poly, we show that instances of 4-COLORING cannot be compressed to a sub-quadratic encoding in polynomial time. We obtain similar results for a number of other graph problems.

Finally we consider NOT-ALL-EQUAL SAT (NAE-SAT). This is a variant of the well-known satisfiability problem, that plays a central role in the theory of NP-completeness. We show that an instance of NAE-SAT with n variables and d literals per clause can not be compressed to an equivalent instance of size O(n^(d−1−ε)) for any ε > 0, unless NP is a subset of coNP/poly. Furthermore we present a generalized kernel that matches this lower bound.

Supervisor(s): H.J. Haverkort

In the field of numerical simulation the finite element method is one of the most popular general purpose techniques for computing accurate solutions. Since memory latency is one of the most serious bottlenecks in high-performance computing, I/O-efficient algorithms which minimize this latency have been developed for finite element methods defined on grid-based discretizations by ordering the data accesses along a space-filling curve. In this thesis we investigate the design of I/O-efficient algorithms for finite element methods for arbitrary discretizations in two and three dimensions. In 2D we are successful in extending the 2D algorithm to arbitrary subdivisions at the cost of doubling the traversal length. In 3D an optimal solution was not achieved, but a highly efficient solution which seems to scale well with the input size is still obtained by using heuristic approaches. Experimental evaluation shows that for tetrahedral subdivisions of point sets up to 80, 000 the cache-miss rate can be reduced to as low as 5%.

Supervisor(s): B.M.P. Jansen

The placing of textual labels is a central task to many visualisation applications. The basic variant of this problem, known as the point-feature labelling problem, has been studied extensively in literature. However, as the problem is NP-hard for most variations, the vast majority of known algorithms give an approximate solution. As such, in this paper we present a fixed parameter tractable algorithm for the point-feature labelling problem in the 1-slider model, returning a provably optimal solution. As parameter k, we use the maximum feedback edge number among all connected components of the graph underlying the problem. Our algorithm has an asymptotic running time of O(2^k*n^2), where n is the size of the point set used as input.

We implemented our algorithm, and present experimental results on real-world examples of data sets. We also tested two techniques that might cause a speed-up in practice, and analyse their effects.