Berry Schoenmakers: PhD thesis

Data Structures and Amortized Complexity in a Functional Setting

PhD thesis, Eindhoven University of Technology, Eindhoven, The Netherlands, September 1992.

Abstract

Part I of this thesis (Algebras, Algebra refinement, Amortized complexity,
Implementation aspects, Analysis of functional programs and algebras,
More on lists and trees) presents a mathematical environment for amortized
complexity studies (physicist's method of Sleator and Tarjan). A restricted 
type of algebras -- called monoalgebras -- play a significant role. Part II 
(Tricky representations of sets and arrays, Maintaining the minimum of a
list, Skew heaps, Fibonacci heaps, Path reversal, splaying, and pairing) 
consists of a number of case studies which are devoted to advanced 
applications of Part I.                                 C. Calude (Auckland)

From "Zentralblatt für Mathematik und ihre Grenzgebiete
Mathematics Abstracts, No.765, Sec. Theory of Computing".

Available in parts

Also some papers related to the thesis are available. To get into the mood, check out these fine interactive animations of splay trees (and more) written in Java by Doug Ierardi and Ta-Wei Li! The paper A Tight Lower Bound for Top-Down Skew Heaps confirms the conjecture that the bounds obtained in Chapter 9 are tight. Interestingly, this result shows that the exact amortized complexity of top-down skew heaps is intimately connected to the golden ratio.

Also see Lee Killough's list of priority queue papers for developments in this area.

Overview

The thesis is divided into two parts. Part I is the more theoretical part in which the above mentioned subjects are elaborated and illustrated with simple examples. Part II consists of a number of case studies which contain more advanced applications of the theory presented in Part I. A major criterion for the selection of the cases in Part II has been the extent to which they serve to illustrate our way of deriving potential functions. Several well-known data structures are analyzed to obtain interesting upper bounds on the amortized running time of the operations.

Part I starts with the definition of an algebra as a collection of datatypes (sets) and operations (relations) in Chapter 1. Notations for important data types and operations are also introduced in this chapter, and it is shown by a number of examples how well-known data structures can be viewed as algebras. Furthermore, a restricted type of algebras, called monoalgebras, is introduced, for which some theory is developed throughout the remainder of Part I.

In Chapter 2, a notion of refinement is defined for monoalgebras. To prepare for this definition, first a notion of data refinement of functions is introduced, which is in turn introduced as a generalization of algorithmic refinement. Our main reason for introducing a notion of algebra refinement is that it enables us to formulate a specification of a data structure as a quest for a concrete refinement of a given abstract algebra. In connection with this, two properties of algebras, called surjectivity and injectivity, are studied at the end of this chapter.

Chapters 3 and 5 are devoted to amortization. In Chapter 3, amortization is first described in an imperative framework. Subsequently, the principle of amortization is discussed in a more abstract setting and it is argued that the physicist's and the banker's methods are equally powerful. Since the former method lends itself better to formalization, we go on to show in Chapter 5 how potential functions can be used to analyze functional programs in a compositional way. Also a general scheme is presented for monoalgebras according to which amortized costs of various types of operations can be defined.

Chapter 4 introduces the functional program notation used throughout the thesis. In addition to algebras normally provided by purely-functional languages, we will also use algebras involving arrays and pointers in our functional programs. For reasons of efficiency, operations of these algebras are implemented destructively. This brings along some restrictions on the use of these algebras. As will be explained briefly, the well-known method of eager evaluation suffices as simple and efficient evaluation method for all programs in this thesis.

Chapter 6 concludes Part I. This chapter introduces some typical tree algebras, which serve as intermediate algebras in the data structure designs in Part II. Dependent on the set of operations required in a design, a suitable view (read "type") of trees is chosen so that each operation can be defined concisely. In this chapter we also deal with the implementation of these algebras (at pointer level), so that we do not need to address this issue in Part II.

Part II starts with the presentation of some tricky representations of sets and arrays in Chapter 7. Basically, an implementation of arrays is presented that achieves O(1) cost for the initialization of arrays of length N (instead of O(N) cost). The standard array implementation achieves O(1) amortized cost for this operation. We use these array implementations to implement bounded sets (subsets of [0..N)).

In Chapter 8, maintaining the minimum of a list, we present implementations of several list algebras, all of which include an operation for the computation of the minimum value of a list. It is shown how the complexity of these implementations increases as the set of list operations gets more advanced.

Chapter 9 presents two nice implementations of mergeable priority queues, viz. our versions of the top-down and bottom-up skew heaps of Sleator and Tarjan. A large part of this chapter is devoted to the calculational derivation of suitable potential functions for these data structures. Our results reduce the bounds obtained by Sleator and Tarjan by more than a factor two. For top-down skew heaps, the paper A Tight Lower Bound for Top-Down Skew Heaps confirms the conjecture that the bounds are tight. Interestingly, this shows that the exact amortized complexity of top-down skew heaps is intimately connected to the golden ratio.

In Chapter 10 we present a generalization of Fibonacci heaps, which implement mergeable priority queues extended with the so-called "decrease key" operation. This algebra of priority queues plays a central role in some important graph algorithms. A pleasing result of this chapter is that our formal description of Fibonacci heaps easily fits on one page.

Chapter 11 is centered around the derivation of potential functions for three similar operations on trees, viz. path reversal, splaying, and pairing. Although the bounds obtained in this chapter do not improve the bounds obtained by the inventors of these operations, we have included our analyses because we have been able to derive the required potential functions to a large extent.

Return to ~berry.