To prevent spam users, you can only post on this forum after registration, which is by invitation. If you want to post on the forum, please send me a mail (h DOT m DOT w DOT verbeek AT tue DOT nl) and I'll send you an invitation in return for an account.

XESame with single table and non-unique IDs

JBuijs
edited January 2011 in - XESame
User 'Pachf' asked the following questions in this topic:

I saw the order example for XESAME, I opened it and it was ok
(visualization also), but I have a question: it is possible to create a
mapping for only one data table (no multi table with relations)?

For example if we have a structure in a csv file:

ID;date_of_order;date_of_delivery;product
1;01-01-2011;03-01-2011;A
1;01-01-2011;04-01-2011;A
2;01-01-2011;04-01-2011;A
3;01-01-2011;04-01-2011;B
3;01-01-2011;04-01-2011;B
4;01-01-2011;04-01-2011;A
...

It
XESAME mapping works with non-uniqe ID, if I have only one ID for order
and delivery (and product delivery are in two phase; e.g. product A
first with ID 1 then the rest also with ID 1 but at the same day)? So I
have full processes and I want to define ordering as trace and order,
deliver as two events.

Or it works only with more tables (order, delivery, etc.) and links between them?

Thanks
Joos Buijs

Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
Previously Assistant Professor in Process Mining at Eindhoven University of Technology

Comments

  • First of all, XESame can extract data from 1 or more tables in 1 database.

    As to your second question, XESame works with non-unique IDs but the resulting event log will be different.

    For example, if you would specify a trace for each ID then you would get 4 trace instances (so in your example there would be traces 1,2,3 and 4). You do not get 2 instances for ID 1 or 2 instances for ID 3. Trace identifiers should always be unique.

    If you would specify an event 'order creation' and relate it to the trace via the 'ID' value then you would get 6 instances of this event in your event log. Two for trace 1, one for trace 2, two for trace 3 and one for trace 4. The timestamp value of this event would be the value of the 'date_of_order' column.
    The same would happen for your event definition of the 'delivery' event. For this event however you specify the 'date_of_delivery' column to contain the date of the event.

    Each record in your data set is treated as an instance of an event, with a certain relation to a trace.
    You could restrict the number of events using SQL critearia in the 'WHERE' attribute of the 'Properties' tab in XESame. Here you can use any normal SQL query that you would write to extract those events (with the trace IDs) that you want.

    For an introduction in XESame the best resource at the moment is my Masters thesis which can be found here:
    http://prom.win.tue.nl/research/wiki/_media/xesame/xesma_thesis_final.pdf

    If you have any remaining questions, please let us know!
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • Dear JBuijs, 

    Thanks for the detailed answer.
    I have already read your thesis, and your order example was clear for me, but it is not to easy apply information in practise to a new (~other structured) dataset.

    Thanks, I try it to solve.

  • Dear JBuijs,

    I tried to define trace and events as you write (for my previous example), but mapping is not work (in visualization there is no lines between table and attributes of events).
    How can I attach my example csv and mapping file?
    Could you help me please, what can be wrong in this mapping:

    Log Attributes: 
    concept:name: 'simple ordering' (in properties: From: proba.csv)
    lifecycle:model: 'standard'

    Trace Attributes: 
    concept:name: 'ordering' (in properties: From: proba.csv  TraceID: ID)

    Event: order:
    concept:instance: ID (in properties: From: proba.csv  TraceID: ID)
    concept:name: 'order'
    lifecycle:transition: 'complete'
    time:timestamp: date_of_order

    Event: deliver:
    concept:instance: ID (in properties: From: proba.csv  TraceID: ID)
    concept:name: 'deliver'
    lifecycle:transition: 'complete'
    time:timestamp: date_of_delivery

    Thanks

  • This was the result in debug mode:

    2011-01-25 13:29:48: (DEBUG) Derby system dir before reboot: null
    2011-01-25 13:29:49: (DEBUG) Derby system dir after reboot: null
    2011-01-25 13:29:54: (PROGRESS) Starting step 1 of 6: initialization.
    2011-01-25 13:29:55: (PROGRESS) Starting step 2 of 6: Extracting log info.
    2011-01-25 13:29:55: (DEBUG) We are about to run the following query to extract the items for Log: 
    SELECT 'simple ordering' AS [concept_name], 'standard' AS [lifecycle_model] FROM proba.csv  
    2011-01-25 13:29:55: (DEBUG) We will insert a new attribute (concept:name with value simple ordering) into the cache DB for the Log item with ID 0
    2011-01-25 13:29:55: (DEBUG) We will insert a new attribute (lifecycle:model with value standard) into the cache DB for the Log item with ID 0
    2011-01-25 13:29:55: (PROGRESS) Starting step 3 of 6: Extracting traces.
    2011-01-25 13:29:55: (DEBUG) We are about to run the following query to extract the items for Trace: 
    SELECT DISTINCT ID AS [traceID], 'ordering' AS [concept_name] FROM proba.csv  
    2011-01-25 13:29:55: (ERROR) There is an error in the query (turn on 'debug' mode to see it) to fetch the information for Trace, this is the error we got: (-3010) [Microsoft][ODBC Szöveg illesztőprogram] Túl kevés paraméter. Helyesen: 1.
    2011-01-25 13:30:32: (PROGRESS) We stopped execution because of a critical error.
  • The ODBC answer is in Hungarian language: 
    "Túl kevés paraméter. Helyesen: 1."

    It means that: 

    "there is not enough parameter, correct: 1"
  • In the Connection settings panel, did you enter the '[' and ']' symbols in the 'SQL Query Column Seperator' setting?

    Otherwise, check if there exists a column named 'ID' in your data source.
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • Dear JBuijs,

    Symbols '['  ']' are there in connection settings. And ID is the first column in my example dataset.

    What can be the problem with mapping?

    How could I attach a mapping file to the forum?

    Thanks
  • You can add a file to a post by clicking on the 'Attach a file' link just below the edit box of a new post. It might be a good idea to compress the .mapping file (e.g. zip it) before uploading.
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • Ok, I try it: two files are in zip: a csv file and a mapping file.

    Please see what can be the error in mapping.

    Thanks

  • Dear Pachf,

    I tried to run your mapping exactly as you specified it on the proba.csv file exactly as you provided and it works.
    The only thing I did was generate the schema.ini that is used by the ODBC CSV driver. I attached this file to the post (remove the '.txt' part). You might want to try using this schema.ini and run XESame again. If it works, could you tell me what the differences are between the two .ini files?

    Hope this helps!
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • Dear JBuijs,

    Ok, thank you I will see it till the end of my workday :-) then I give feedback for you.

  • Dear JBuijs, 

    It is work!!!! So the problem was that I had not schema.ini file. 

    Thank you very much for your help!


  • No problem 'pachf', happy mining!
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
Sign In or Register to comment.