To prevent spam users, you can only post on this forum after registration, which is by invitation. If you want to post on the forum, please send me a mail (h DOT m DOT w DOT verbeek AT tue DOT nl) and I'll send you an invitation in return for an account.

Q: IM-D and loops with overlapping start and end nodes

Hello there,

here's another rather detailed question about Inductive Miner DFG-based (IM-D). The question is twofold: about a matter of taste and about the specification of IM-D.

It is a fact that any purely DFG-based method cannot distinguish arbitrary process trees that have the same DFG, although they have different languages. They can be distinguished if they are not arbitrary but obey certain restrictions. My question is in relation to what happens when such restrictions are violated, in this case restriction C(B).3 that no subtree should have overlapping start/end nodes. Sander discusses this in his Ph. D. thesis in section 5.2.1.

In figure 5.3 he presents a DFG consisting of 4 activities a, b, c, d which are pairwise interconnected in both directions and where each of a and c are both start and end nodes, while b and d are neither. Here is a log that will give rise to this DFG:

(a, b, a)
(a, c, a)
(a, d, a)
(c, b, c)
(c, d, c)
(a, b, d, c)
(c, d, b, a)

When I feed this log to "Mine process tree with Inductive Miner - directly follows" in ProM 6.10, it will compute the following process tree:
PARA(LOOP(a, PARA(b,d), tau), c)

The loop has two redo parts. I am concerned with the second one, which is a tau-redo.

Without that tau-loop, only the trace (a, b, d, c) could be replayed, while with that tau-loop one can also replay (a, c, a). So replay fitness with respect to the log increases from 1/7 to 2/7 (which is still low). However, this happens at the cost of precision, because the model also allows unobserved traces like (a, a, a, c) etc. In fact, the model has a language that does not give rise to the same DFG as the one from which it was derived, but to a DFG that has an additional self-edge over a.

My question of taste is: Do you think that this model is preferable to one that didn't have that tau-redo? My personal opinion would be to prefer the model without the tau-redo, on account of DFG-equivalence.

The point is reinforced by considering that because of the symmetry of the DFG, a and c are interchangeable here, and it must be just coincidence that IM-D in ProM lets a end up inside the loop and c outside it. Might as well be the other way around. If it were the other way around, there wouldn't even be an increase in fitness with respect to our log with that tau-loop, the only replayable trace would be (c, d, b, a) with or without it.

My theoretical question is: what clause in the specification of IM-D licenses the introduction of that strange tau-redo? As I read Sander's description of the loop cut and attendant DFG-split in sections 6.1.2 and 6.6.3 of his thesis, it shouldn't even be there. Am I right? (I'm probably wrong.)

-- Sebastian

Comments

  • Turns out that "Inductive Miner - directly follows" in ProM 6.10 seems to always generate superfluous tau-redos. It does so even for the simple one-trace log (a, b, a), licensing arbitrarily long sequences consisting only of a. I am inclined to consider that a bug, and probably not part of the specification, but I'd be happy if someone could confirm that.
  • I think the first example you found illustrates that the DFG abstraction is just not informative enough to derive good quality model in these cases (and thus why the visualisations used by most commercial process mining offerings have inherent issues). You could introduce as many events as you want, and as long as you always start and end with an A or a C, you'll get the same DFG, with no way for any miner based on that abstraction to see the difference.

    Notice though that it might still be possible to do something smart with the frequencies of the edges, which -are- available, e.g. by combining IMfd and IMcd.

    As for the example of (a, b, a), I've tried in the current Nightly Build, and it gives me the model loop(a, b), which I would expect there. Please note that if you visualise a process tree in ProM, you'll be shown the three-ary loop of the ProcessTree package, which shows a body, redo and exit. The exit is always tau in loops coming from IM and IMd (thus, that's not a redo but the exit). To avoid confusion, you could use the visualiser "Process tree visualisation (Inductive visual Miner)". I think an apology is in place here for the confusing way loops are visualised in the different plug-ins of ProM.
    Sander Leemans
    Assistant Processor (Lecturer) at Queensland University of Technology
    Author of the visual Miner and Inductive Miner
  • Thanks for clearing up that confusion!
Sign In or Register to comment.