Today, April 15th 2016, I have submitted the Divide and Conquer paper to The Computer Journal.
- Divide And Conquer
- H. M. W. Verbeek, W. M. P. van der Aalst, and J. Munoz-Gama
- Process mining has been around for more than a decade now, and in that period several discovery and replay algorithms have been introduced that work fairly well on average-sized event logs. Nevertheless, these algorithms have problems dealing with big event logs. If the algorithms do not run out of memory, they will run out of time, because the problem handed to them is just too complex to be solved in reasonable time and space. For this reason, a generic approach has been developed which allows such big problems to be decomposed into a series of smaller (say, average-sized) problems. This approach offers formal guarantees for the results obtained by it, and makes existing algorithms also tractable for larger logs. As a result, discovery and replay problems may become feasible, or may become easier to handle. This paper introduces a tool framework, called Divide and Conquer that fully supports this generic approach, which has been implemented in ProM 6. Using this novel framework, this paper demonstrates that significant speed-ups can be achieved, both for discovery and for replay. This paper also shows that a maximal decomposition (that is, a decomposition into as many subproblems as possible) is not always preferable: Non-maximal decompositions may run as fast, and generally provide better results.