A cache Filtering Optimisation for Queries to Massive Datasets on Tertiary Storage

Peter van der Stok

Presented at: DOLAP 1999

ABSTRACT

We consider a system in which many users run queries to examine subsets of a large object set. The object set is divided over files on tape. A single subset of objects will be visited by multiple queries in the workload: this locality of access creates the opportunity for caching on disk. We introduce and evaluate a novel optimisation, cache filtering, in which the hot objects are automatically extracted from the files that are staged on disk. Cache filtering can lead to complex situations in the disk cache. We show that these do not prevent effective caching and we introduce a special cache replacement algorithm to maximise efficiency. Through simulations we evaluate the system over a broad range of likely query workloads. Depending on workload and system parameters, the cache filtering optimisation yields speedup factors up to 6.

Postscript