Automatic Reclustering of Objects in Very Large
Databases for High Energy Physics
Peter van der Stok
Appeared in: Proceedings of International Database Engineering and Applications Symposium IDEAS98, pp 132-140, IEEE (1998).
ABSTRACT
In the very large object database systems planned for some future particle physics experiments, typical physics analysis jobs
will traverse millions of read-only objects, many more objects than fit in
the database cache.
Thus, a good clustering of objects on disk is highly critical to database performance.
We present the implementation and performance measurements of a prototype reclustering mechanism which was developed to optimize I/O performance
under the changing access patterns in a high energy physics database.
Reclustering is done automatically and on-line.
The methods used by our prototype differ greatly from those commonly found in general-purpose
reclustering systems.
By exploiting some special characteristics of the access patterns of the physics analysis jobs, the prototype manages to keep database I/O throughput
close to the optimum throughput of raw sequential disk access.
Postscript