 
Instructor: Herman Haverkort (office: HG 7.35)
Course dates: February to June 2012
(Registration for this course is now closed. Students who did not register are welcome to our similar course next year.)
Update 15 June: added the slides of the last three lectures.

Update 17 February: included most of the lecture schedule, most of the literature that needs to be studied, the answers to the prerequisites test
(use username "io" and password "io" to access the file), and some "small print" at the bottom (policies regarding attendance, grading, deadlines etc.).

Traditionally, the running time of an algorithm is analysed in terms of the number
of computational steps it performs (comparisons, additions, etc.). However, when working
with huge amounts of data that do not fit in main memory, the number of disk accesses
is much more important for the running time than the number of computational steps.
Even when the data fit in main memory at once, a significant part of the running time
of an algorithm that works on large amounts of data is determined by the way in which
the algorithm exploits the cache: data that is cached can be accessed much faster
than data that resides in main memory. Therefore it is important to design algorithms
for large data sets in such a way, that the data needed next is always likely to be
data recently brought into cache.In this course we study socalled I/Oefficient
algorithms: algorithms and data structures that are designed in order to keep the number
of disk accesses or cache misses as small as possible. We discuss models for the design
and analysis of such algorithms, basic algorithms, data structures and paradigms, and
applications where I/Oefficiency makes the difference between algorithms that work and
algorithms that do not work in practice. Furthermore, we will see I/Oefficiency at work
in some simple experiments with huge data sets.
The course will make the students familiar with the basic concepts and techniques
for the design of I/Oefficient algorithms and data structures. The students will
learn to recognize I/Obottlenecks in algorithms, learn basic techniques to alleviate
such bottlenecks, and will get an idea of the orders of magnitude of the parameters
involved. The students will practice finding, reading, interpreting and explaining
advanced literature on the topic. Furthermore the course offers opportunities
for students to practice their presentation skills.
The course is taught in a mixed form. Some lectures are given by the instructor;
most lectures are given by the students, who will also run a number of experiments.The students collaborate in triples to prepare their lectures. Each triple gets assigned a particular
topic and has to give a lecture of 2 x 45 minutes. Links to basic literature for the
lectures are provided by the instructor. The students are expected to look up additional
literature when necessary. The instructor will advise the students on the preparation of their talks. The students
will meet with the instructor at least two weeks before the lecture for advice on its
contents, and at least one week before the lecture to discuss (the draft version of) the
presentation. Each student is expected to spend about 40 hours on preparing the lecture.
For some general advice on how to study the material and on how to prepare your talk,
see How to give a seminar talk. The students will also collaborate in (and also across) triples to study possible solutions
for problems where I/Oefficiency is crucial. The problems studied are abstracted versions
of fundamental problems occurring in reallife applications. The students will do experiments
with different solutions and write short papers about them. Near the end of the course, the
students review each other's papers and present their results to each other.
There is a homework assignment at the end of course.
Registration for this course is now closed.
All meetings for this courseexcept one on 5 Aprilare on Fridays, from 15:45 to 17:30 in the lecture hall of GeminiZuid.
date  time  room  topic  literature  lecturer(s)  PC time 
Fri  10  Feb  15:45  GemZ  Introduction  Meyer: sections 1 (except 1.5), 2 (intro), 2.1.1, 3.13.2; Demaine: section 2; slides  Herman Haverkort  N/A 
Sun  12  Feb  23:59  deadline for prerequisites test 
Fri  17  Feb  15:45  GemZ  Basics: stacks, queues, sorting, caching policies  see 1 Feb; in addition Duartes; Sleator, section 4; slides/notes  Herman Haverkort  N/A 
Fri  24  Feb  no meeting (carnival vacation)  N/A 
Fri  2  Mar  15:45  GemZ  Challenges: graph traversals, permutations, list ranking  Meyer: sections 1.5, 4.1, 4.2; Zeh: sections 2, 3; slides/notes  Herman Haverkort  27/24/3 Team ABM 
Fri  9  Mar  15:45  GemZ  Breadthfirst search  Meyer: sections 4.2, 4.3; and/or Zeh: section 6; slides directed BFS, slides undirected BFS  Team PQS  5/311/3 Team ABM 
Fri  16  Mar  15:45  GemZ  Ordered lists and search trees; report by team ABM;  Meyer: sections 2.1.2, 2.3 (intro), 2.3.12.3.4; Demaine: section 4.1 slides/notes  Herman Haverkort  12/318/3 Team ABM 
Fri  23  Mar  15:45  GemZ  Persistent Btrees and interval trees  Arge (2): sections 4, 6; slides persistent Btrees; slides internal interval trees; slides external interval trees  Team BNW  19/325/3 Team MNW 
Fri  30  Mar  no meeting (open door day)  26/31/4 Team LMP 
Thu  5  Apr  15:45  Aud 1  Priority queues, shortest paths and DFS  Meyer: sections 4.47; and/or Zeh: sections 6.1, 7; slides DFS; slides SSSP  Team AT  2/48/4 Team BNW 
Fri  13  Apr  no meeting (midsemester break)  9/415/4 Team LMP 
Fri  20  Apr  no meeting (midsemester break)  16/422/4 Team AT 
Fri  27  Apr  15:45  GemZ  Priority queues, timeforward processing and plane sweeping  Meyer: sections 3.4, 3.5 or Zeh: section 4; Arge (1): sections 14; De Berg: section 2.1; slides  Team LMP  23/429/4 Team AT 
Fri  4  May  15:45  GemZ  Shortest paths on grids and planar graphs  Zeh: section 8; slides  Team HLS  30/46/5 Team GGT 
Fri  11  May  no meeting (lecturers ill)  7/513/5 Team GGT 
Fri  18  May  no meeting (university closed on the day after Ancension Day)  14/520/5 Team PQS 
Fri  25  May  15:45  GemZ  Operations on sparse matrices  Bender: sections 14, 7; slides  Team GGT  21/527/5 Team HLS 
Fri  1  Jun  15:45  GemZ  Data structures for points in the plane  Arge (2): section 8.2.1; Beckmann; Kamel; Van den Bercken; slides  Team MNW  28/53/6 Team MNW 
Fri  8  Jun  15:45  deadline almostfinal (reviewable) reports on experiments with sorting & permutation, BFS, DFS, and timeforward processing  4/610/6 Team BNW 
15:45  GemZ  Topological sorting  Arge (3): sections 12; Ajwani; slides  Team ABM 
Fri  15  Jun  15:45  deadline almostfinal (reviewable) reports on experiments with connected components, matrix operations, search trees, and text search structures  11/617/6 Team PQS 
15:45  deadline reviews on experiments with sorting & permutation, BFS, DFS, and timeforward processing 
15:45  GemZ  Discussing experiments with sorting & permutation, BFS, DFS, and timeforward processing   Teams ABM, LMP, AT, GGT 
Fri  22  Jun  15:45  deadline reviews on experiments with connected components, matrix operations, search trees, and text search structures  
15:45  GemZ  Discussing experiments with connected components, matrix operations, search trees, and text search structures   Teams HLS, BNW, MNW, PQS 
Sun  8  Jul  23:59  deadline homework assignment  
At the end of each meeting, some time may be reserved to discuss the progress or results of early experimentation projects. Note that attending all meetings is mandatory. In case this is a problem, contact Herman Haverkort in advance. The teams and their assignments are as follows:
team  members  lecture  experimentation project  computer time 
ABM  Alex, Bauke, Martin  Topological sorting (8/6)  Sorting & Permutation  until 18/3 
AT  Ade, Teng  Depthfirst search etc. (5/4)  Depthfirst search  16/429/4 
BNW  Bram, Nicolas, Wilco  Persistent Btrees etc. (23/3)  Matrix operations  2/48/4; 4/610/6 
GGT  Gijs, Gijs, Twan  Sparse matrices (25/5)  Timeforward processing  30/413/5 
HLS  Hoisun, Luc, Stef  Planar graphs (4/5)  Connected components  21/527/5 
LMP  Lourens, Marlous, Pieter  Timeforward processing etc. (27/4)  Breadthfirst search  26/31/4; 9/415/4 
MNW  Marcel, Niels, Wouter  Data structures for points (11/5)  Search trees  19/325/3; 28/53/6 
PQS  Paul, Quirijn, Sander  Breadthfirst search (9/3)  Text search structures  14/520/5; 11/617/6 
[Ajwani]
D. Ajwani, A. CosgayaLozana, and N. Zeh,
Engineering a topological sorting algorithm for massive graphs,
In International Workshop on Algorithm Engineering and Experiments (ALENEX) 2011, p139150
Available from publisher;
local copy.[Arge (1)]
L. Arge,
The Buffer Tree: a technique for designing batched external data structures,
Algorithmica (2003) 37:124
Available from publisher;
local copy. [Arge (2)]
L. Arge,
External memory geometric data structures,
Lecture notes, 2005
Available from author's website;
local copy. [Arge (3)]
L. Arge and L. Toma,
Simplified external memory algorithms for planar DAGs.
In Proc. 9th Scandinavian Workshop on Algorithm Theory (SWAT) 2004, LNCS 3111:493503
Available from publisher;
local copy. [Beckmann]
N. Beckmann and B. Seeger,
A revised R*tree in comparison with related index structures,
In Proc. 35th ACM SIGMOD Int. Conf. on Management of Data (SIGMOD) 2009, p799812
Available from publisher;
local copy. [Bender]
M. A. Bender, G. Stølting Brodal, R. Fagerberg, R. Jacob, and E. Vicari,
Optimal sparse matrix dense vector multiplication in the I/OModel,
Theory of Computing Systems (2010) 47(4):934962
Available from publisher;
local copy of authors' manuscript. [Van den Bercken]
J. van den Bercken and B. Seeger,
An Evaluation of Generic Bulk Loading Techniques,
In Proc. 27th Int. Conf. on Very Large Data Bases (VLDB 2001), p461470
Available from publisher;
local copy. [De Berg]
M. de Berg, M. van Kreveld, M. Overmars, and O. Cheong (a.k.a. Schwarzkopf),
Computational geometry: algorithms and applications,
2nd edition, Springer, 2000.
[Demaine]
E. Demaine,
CacheOblivious Algorithms and Data Structures,
Lecture notes of EEF Summer School on Massive Data Sets, Aarhus, 2002
Available from the author's website;
local copy. [Duartes]
G. Duartes,
What your computer does while you wait,
blog post, 30 Nov 2008; [Kamel]
I. Kamel and C. Faloutsos,
Hilbert Rtree: An improved Rtree using fractals,
In Proc. of 20th Int. Conf. on Very Large Data Bases (VLDB) 1994, p500–509
Available from publisher;
local copy. [Meyer]
U. Meyer, P. Sanders, and J. Sibeyn (eds.),
Algorithms for Memory Hierarchies,
Lecture Notes in Computer Science 2625, Springer, 2003.
The whole book can be downloaded from the publisher's website.
For now, the following parts are most relevant:
 Chapter 1: models and lower bounds (includes the permutation lower bound)
 Chapter 2: basic external memory data structures (includes stacks, queues, linked lists, Btrees)
 Chapter 3: a survey of techniques for designing I/Oefficient algorithms (includes scanning, sorting, timeforward processing, deterministic computation of maximal independent sets, list ranking, and Euler tours)
 Chapter 4: Elementary graph algorithms in external memory (includes BFS, DFS, and SSSP in directed and undirected graphs)
 Bibliography and index
[Sleator]
D. D. Sleator and R. E. Tarjan,
Amortized efficiency of list update and paging rules,
Communications of the ACM (1985) 28(2):202208
Available from publisher;
local copy. [Zeh]
N. Zeh,
I/OEfficient Graph Algorithms,
Lecture notes of EEF Summer School on Massive Data Sets, Aarhus, 2002
Available from author's website;
local copy.
Students coming from the TU/e computer science bachelor programme must have
completed Algorithms (2IL15) successfully.If you got your prior education elsewhere, we will need to check if you have the
proper prior knowledge to be able to complete the I/Oefficient algorithms course
successfully. We will expect that you have experience with the following topics:
incremental algorithms, divide and conquer, greedy algorithms, proofs with induction and
invariants, Onotation and recurrences, sorting, search trees, depthfirst and
breadthfirst search, minimum spanning trees, and shortest path algorithms.
Experience with randomized algorithms is recommended. To be able to assess whether there are serious deficiencies in prior knowledge,
students are asked to make a prerequisites test.
Here are the right answers to the test,
including recommendations for further reading in case you got them wrong.
Use username "io" and password "io" to access the file.
If you have insufficient prior experience with algorithms, you will first need
to take Algorithms (2IL15), Advanced Algorithms (2IL45), or Geometric Algorithms (2IL55);
to be able to take these courses you might need Data Structures (2IL05) first.
(All of these courses are taught in English.)
Your grade will essentially be determined by adding up grades for:
 your presentation
 your experiments (and your report(s) about them)
 your review of another group's report
 your participation in class (which basically boils down to: the number of questions you asked in class)
 your homework assignment
All of the above assignments (presentation, experiments, review, your presence in class, homework)
are mandatory. If you skip an assignment, your final grade will be a five or less. Here is the small print for this course:
 Attending all meetings is mandatory.
Absence without approval after prior notice: 1 point off your grade each time.
 The student is responsible for handing in assignments in time; the student is
responsible for taking proper precaution against unexpected circumstances such
as illness, broken laptops, network failures, other deadlines etc., by not doing things in the
very last moment. That means that I will not accept your late assignment if you
did not contact me at least three days before the deadline about the delay
and with a reasonable suggestion for a solution. Depending on the consequences
for the course as intended, handing in late may reduce your grade substantially.
 I will report any suspicions of fraud to the exam committee. If I can prove it
(for example: if I find a fragment of more than two sentences of your work in an
older source on the web while you did not indicate that it was a quote), your
final grade will be a 1, independent of any additional measures taken by the department.
 