Automatic Reclustering of Objects in Very Large Databases for High Energy Physics

Peter van der Stok

Appeared in: Proceedings of International Database Engineering and Applications Symposium IDEAS98, pp 132-140, IEEE (1998).

ABSTRACT

In the very large object database systems planned for some future particle physics experiments, typical physics analysis jobs will traverse millions of read-only objects, many more objects than fit in the database cache. Thus, a good clustering of objects on disk is highly critical to database performance. We present the implementation and performance measurements of a prototype reclustering mechanism which was developed to optimize I/O performance under the changing access patterns in a high energy physics database. Reclustering is done automatically and on-line. The methods used by our prototype differ greatly from those commonly found in general-purpose reclustering systems. By exploiting some special characteristics of the access patterns of the physics analysis jobs, the prototype manages to keep database I/O throughput close to the optimum throughput of raw sequential disk access.

Postscript