Hello there,
here's another rather detailed question about Inductive Miner DFG-based (IM-D). The question is twofold: about a matter of taste and about the specification of IM-D.
It is a fact that any purely DFG-based method cannot distinguish arbitrary process trees that have the same DFG, although they have different languages. They can be distinguished if they are not arbitrary but obey certain restrictions. My question is in relation to what happens when such restrictions are violated, in this case restriction C(B).3 that no subtree should have overlapping start/end nodes. Sander discusses this in his Ph. D. thesis in section 5.2.1.
In figure 5.3 he presents a DFG consisting of 4 activities a, b, c, d which are pairwise interconnected in both directions and where each of a and c are both start and end nodes, while b and d are neither. Here is a log that will give rise to this DFG:
(a, b, a)
(a, c, a)
(a, d, a)
(c, b, c)
(c, d, c)
(a, b, d, c)
(c, d, b, a)
When I feed this log to "Mine process tree with Inductive Miner - directly follows" in ProM 6.10, it will compute the following process tree:
PARA(LOOP(a, PARA(b,d), tau), c)
The loop has two redo parts. I am concerned with the second one, which is a tau-redo.
Without that tau-loop, only the trace (a, b, d, c) could be replayed, while with that tau-loop one can also replay (a, c, a). So replay fitness with respect to the log increases from 1/7 to 2/7 (which is still low). However, this happens at the cost of precision, because the model also allows unobserved traces like (a, a, a, c) etc. In fact, the model has a language that does not give rise to the same DFG as the one from which it was derived, but to a DFG that has an additional self-edge over a.
My question of taste is: Do you think that this model is preferable to one that didn't have that tau-redo? My personal opinion would be to prefer the model without the tau-redo, on account of DFG-equivalence.
The point is reinforced by considering that because of the symmetry of the DFG, a and c are interchangeable here, and it must be just coincidence that IM-D in ProM lets a end up inside the loop and c outside it. Might as well be the other way around. If it were the other way around, there wouldn't even be an increase in fitness with respect to our log with that tau-loop, the only replayable trace would be (c, d, b, a) with or without it.
My theoretical question is: what clause in the specification of IM-D licenses the introduction of that strange tau-redo? As I read Sander's description of the loop cut and attendant DFG-split in sections 6.1.2 and 6.6.3 of his thesis, it shouldn't even be there. Am I right? (I'm probably wrong.)
-- Sebastian
Comments
Assistant Processor (Lecturer) at Queensland University of Technology
Author of the visual Miner and Inductive Miner