To prevent spam users, you can only post on this forum after registration, which is by invitation. If you want to post on the forum, please send me a mail (h DOT m DOT w DOT verbeek AT tue DOT nl) and I'll send you an invitation in return for an account.

Best way to speed up the evolutionary tree miner?

Hello everyone, 

I'm interested in the ETM, since you can setup a weight for each of the four quality dimensions used for computing the quality of a resulting process tree/petri net. However, the runtime of ETM is minutes compared to algorithms like the Inductive Miner. Since I want to apply ETM to 4000+ workflow logs, the complete runtime of ETM will take me an impractical amount of time. My question is: "What is the best way to speed up ETM?". 

Now I am aware of the time limit you can give ETM to run. I will check how (badly) setting this parameter to a few seconds will influence the resulting model when it comes down to the quality dimensions replay fitness, precision, generalization and simplicity.

Hopefully someone has some tips for me!

Thanks in advance.

Cheers,
Daniël

Answers

  • Hi Daniel,

    Interesting question, I tried to address this during the development of the ETM as well.
    I found that time increases exponentially when adding more activities (/event classes). So you could try to run the ETM on a subset of the activities. If desired, you can feed the result into a next session of the ETM with more activities.
    (If you're really adventurous you can dive into the code, there is code there that gradually increases the number of activities included. See the CentralRegistry functions for a start of the red thread to follow).

    Another option would be to filter out complex / unique traces.

    Majority of time spend is in the alignments.
    Number of trees / options explodes with increase of number of activities.

    Those are your two aspects to play with I guess.

    Hope this helps!!!
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • DStekel
    edited May 2018
    Dear Joos,

    Thank you for your response. I will try to run ETM on an event log where activities are filtered based on their frequency.

    At first I though that the number of iterations would have a major effect on the runtime. Do you still think that decreasing the number of activities will have a bigger impact?

    Thanks for your help,
    Daniël
  • Hi Daniel,

    Of course, number of iterations and population size also influence the runtime, but decreasing them will result in a lower quality answer.

    There are several parameters you can tweak, and it really depends on what your goal is, and thus what you're willing to sacrifice... :smile: 
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • In my thesis I'm trying to find a suitable solution for mining process trees, given a large collection of workflow logs (in my case 4000+ at the moment). At first, I applied the Inductive Miner, which is able to complete within a few minutes. However, then I read about the four quality dimensions and how you can tweak their weights using ETM. This way, I gained interest into using ETM, which has a major disadvantage over IM: its runtime is very long compared to IM. I see now two ways:
    1. Continue using Inductive Miner in my research, and stating ETM for future works.
    2. Switch to ETM and try to find a solution where the complete runtime is reasonable (within an hour or so).

    Option 1 will save me time, but results in process models with lower quality than using option 2.

    I will discuss this matter with my supervisors. Thanks for your input!

    Daniël
  • Please note that you can 'feed' (/seed) the ETM with one or two models!
    This would enable you to go to scenario 3: start with the IM and then feed that model (or models found with different parameters) to the ETM and then let the ETM run for a set amount of time, focusing on a certain balance.

    Also note that ETM has a Pareto function, which results in a collection of models that are all focussing on a different trade-off between the provided quality dimensions. (and the Pareto mode can also be seeded with one or more models)
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
Sign In or Register to comment.