To prevent spam users, you can only post on this forum after registration, which is by invitation. If you want to post on the forum, please send me a mail (h DOT m DOT w DOT verbeek AT tue DOT nl) and I'll send you an invitation in return for an account.

Ideas for data transformation & analysis

Hello all,

I am new in process mining and I need some help/directions how could I get some value from data collected.

Below is an example how my data are currently structured, but I can transform them if needed.

I have "status from" and "status to" that can easily be converted to "transition", or to create one entry per status arrival.
I have seen in exercises that there is usually 1 entry per case, listing horizontally all statuses (or transitions?). Do I need to convert it in such way? Could I do it with Prom?

Please suggest what would be the best way to visualize those data. Thanks


Comments

  • JBuijsJBuijs Posts: 912
    Hi Michalis,

    I think that you have 4 main columns:
    - case (as case identifier)
    - Status to (as activity)
    - Department (as resource, or group)
    - Data (as timestamp)

    I think that status from is not directly relevant, at least not for a first analysis, mainly because it is implied by the previous event (except for the first event of course).
    Similarly, duration is also derived.

    So, I would suggest to try to import your data using these sessions and see what comes out. You can just load your data as a CSV file into ProM and run the conversion algorithm.

    We usually have data in this shape, where each row is one event (and not where each row is a case).
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • michalismichalis Posts: 3
    edited December 2016
    Hi Joos,

    thanks for your answer. The term activity is bit confusing for me.

    Should I consider the "Status-to" as activity, or the transition? I get totally different results when loading the csv.

    For example if I consider "In analysis" (status) as activity, then I will get about 25 different activities, while if I consider "Open -> In analysis" (transition) as activity, then I get more than 100 different combinations.

    Is there any beginner's-guide with best practices? (I had followed the Coursera course 2 years ago)

    Post edited by michalis on
  • JBuijsJBuijs Posts: 912
    Hi Michalis,

    I think your first option would work best, e.g. use status as an activity.

    If you would like to know more I can recommend another process mining MOOC we developed titled "Process mining with ProM":
    https://www.futurelearn.com/courses/process-mining/
    This might answer your questions already in the first week :)

    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • Thanks. I will check it
Sign In or Register to comment.