To prevent spam users, you can only post on this forum after registration, which is by invitation. If you want to post on the forum, please send me a mail (h DOT m DOT w DOT verbeek AT tue DOT nl) and I'll send you an invitation in return for an account.

How do I create a mapping in XESame from .csv to XES?

JBuijsJBuijs Posts: 912
edited October 2010 in - XESame
From
http://prom.win.tue.nl/forum/index.php?p=/discussion/comment/71/#Comment_71
Where this question was asked:

---------------------------------------
The certainly helped me.

one more query regarding generating .xes / .mxml file of my own data source.

I have a .csv file. how do i generate the ".mappings" for this file?

so that, using that i can generate .xes from XESame.

I tried to go through your thesis report. but i didnt find the explicit steps for generation of mappings.

---
Thanks.
Narayan.

-------------------------------------

Answer:
A definition can be made in the definition workspace (middle tab on the top). Here you see a tree structure with different elements of the mapping.
How to define a mapping using this tree structure and what to enter in each field can be read in my Master's thesis, mainly in Chapter 5, Section 5.3.2. More complex examples are discussed in Chapter 6 which will also help you to understand the definition of the mapping.
See http://prom.win.tue.nl/research/wiki/_media/xesame/xesma_thesis_final.pdf

Good luck Narayan!

Joos
Post edited by JBuijs on
Joos Buijs

Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
Previously Assistant Professor in Process Mining at Eindhoven University of Technology

Comments


  • Dear Joos,

    I am working on XESame with my own data file called IGSTK.csv

    2010-10-18 14:22:47: (PROGRESS) Starting step 1 of 5: initialization.
    2010-10-18 14:22:50: (PROGRESS) Starting step 2 of 5: Extracting log info.
    2010-10-18 14:22:50: (DEBUG) We are about to run the following query to extract the items for Log:
    SELECT 'No attributes' AS DUMMY FROM
    2010-10-18 14:22:50: (ERROR) There is an error in the query (turn on 'debug' mode to see it) to fetch the information for Log, this is the error we got: (-3506) [Microsoft][ODBC Text Driver] Syntax error in FROM clause.
    2010-10-18 14:22:51: (PROGRESS) We stopped execution because of a critical error.

    Please advise / correct me on this above error.

    The IGSTK.csv, IGSTK.mapping(which i configured and saved as) and few screen-shots of my execution are uploaded at my
    website (http://www.public.asu.edu/~lmotamar/XESame/).

    please refer to see the attributes and properties of the mapping and correct me if any.

    Thanks
    ---
    Narayan.
  • JBuijsJBuijs Posts: 912
    Dear Narayan,

    Unfortunately, XESame is not that smart yet that it detects that for the log element you didn't provide any attribute values. Therefore, it still builds a query and tries to run it. Since you didn't specify the 'from' property of the log element this gives an error when executing the query.
    Solution: enter 'IGSTK.csv AS igstk' in the 'from' property of the log element (as you did for the trace element).

    Another question: is it correct that the event definition is also empty?

    Let me know how it works out.

    Joos
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology

  • Dear Joos,

    This time, I made the changes u mentioned. I added the log, trace and event definition info.

    I got this error.
    ---

    2010-10-21 00:29:05: (DEBUG) Derby system dir before reboot: null
    2010-10-21 00:29:10: (DEBUG) Derby system dir after reboot: null
    2010-10-21 00:29:15: (PROGRESS) Starting step 1 of 5: initialization.
    2010-10-21 00:29:18: (PROGRESS) Starting step 2 of 5: Extracting log info.
    2010-10-21 00:29:18: (DEBUG) We are about to run the following query to extract the items for Log:
    SELECT 'igstk log' AS [concept_name], 'standard' AS [lifecycle_model] FROM IGSTK.csv AS igstk
    2010-10-21 00:29:19: (DEBUG) We will insert a new attribute (concept:name with value igstk log) into the cache DB for the Log item with ID 0
    2010-10-21 00:29:19: (DEBUG) We will insert a new attribute (lifecycle:model with value standard) into the cache DB for the Log item with ID 0
    2010-10-21 00:29:19: (PROGRESS) Starting step 3 of 5: Extracting traces.
    2010-10-21 00:29:19: (DEBUG) We are about to run the following query to extract the items for Trace:
    SELECT DISTINCT AS [traceID], 'igstk: ' & igstk.id AS [concept_name], '' AS [IGSTK_details] FROM IGSTK.csv AS igstk
    2010-10-21 00:29:19: (ERROR) There is an error in the query (turn on 'debug' mode to see it) to fetch the information for Trace, this is the error we got: (-3504) [Microsoft][ODBC Text Driver] The SELECT statement includes a reserved word or an argument name that is misspelled or missing, or the punctuation is incorrect.
    2010-10-21 00:29:21: (PROGRESS) We stopped execution because of a critical error.

    ---

    Well, I made the mapping definition based on your example.
    My Data (IGSTK.csv) is a single file, unlike yours. so i just created a single event. and i tried to define the entire mapping much similar to users.

    can you find any corrections to be made.

    again, I am placing the updated "igstk_latest.mapping" file in my website. pls have a look. http://www.public.asu.edu/~lmotamar/XESame/

    ---
    Narayan.
  • JBuijsJBuijs Posts: 912
    Dear Narayan,

    As far as I can see you didn't specify a 'traceID' for the trace element.
    The traceID is used to link events to their traces (/cases).
    So for each trace you need to specify what the unique identification is for a trace (most likely the primary key of the trace table).
    For each event definition you need to specify where the traceID is stored to relate it back to the traces.

    Hope this helps!
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • actually when I ran the example, i did mention the traceID as 'id' for the trace element.

    i suppose it ddnt save properly.

    and now again I ran with putting traceID as id for trace element, I am still getting same error. can u update the traceID with value as "id" and run the mapping? and check for me...?

    pls.
    ---
    Narayan.
  • JBuijsJBuijs Posts: 912
    Dear Narayan,

    I inspected your igstk_latest.mapping file and did the following:
    - added id in the trace property traceID
    - added '-symbols for the event name en transition since these are not column names but fixed values.

    I uploaded the new mapping file here:
    http://www.win.tue.nl/~jbuijs/files/tmp/igstk_Joos_TraceIDAdded.mapping

    I can not test it because I don't have the .ini file that tells the ODBC driver how to interpret the CSV file.

    If you encounter any more errors could you:
    - provide the .ini file so I can run the mapping file
    - provide the console output for that run with debug message's so I can pin-point the error

    Hope this helps you further.
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • Joos,

    The mapping file you provided worked. Thank you.

    and now I understand the difference of specifying values to attributes.

    one more small issue:
    -> In the same mapping which you provided, there is an event attribute "org:resource" with value igstk.Reporter
    where Reporter is the column name from IGSTK.csv

    similarly for "time:timestamp" event attribute, I gave the value igstk.DateSubmitted where DateSubmitted is the column name from IGSTK.csv
    {Initially the column name was Date Submitted, which I changed to DateSubmitted, removing the space}.

    Now my problem is, I saved the mapping as .MXML and opened in ProM 5.2

    In the filter section, it doesnt show me start event, end event and event type values.
    (I am uploading a screenshot of this, for your understanding @
    http://www.public.asu.edu/~lmotamar/XESame/prom5_screenshot.jpg)
    {because though, I am able to run the conversion and generate .MXML/.XES files, with out proper attributes definitions I am not able to get proper Mining/Analysis plugins results}

    I see some problem in attributes definition,.

    pls correct me,.

    ---
    Narayan,
  • JBuijsJBuijs Posts: 912
    Dear Narayan,

    In your mapping you specified one 'fixed' event: 'bug create' and gave it the life cycle transistion 'resolved'. This means that, in your case, every record in your source file will become one trace with one event named 'bug create'.

    If there is more information in your log file you should add more event definitions. If there is for instance another timestamp column then this could indicate another event which you can include by creating another event definition.

    Furthermore, you used the lifecycle transition 'resolved'. This is not one of the standard lifecycle states. See the XES standard definition (http://www.xes-standard.org/_media/xes/xes_standard_proposal.pdf) for the standard lifecycle schema.

    So you have only one event per trace since you specified only one event per trace.
    Your event type is 'unknown' since its not one of the standard event lifecycle states. ProM 6 should be able to handle this better. In general, its best to use one of these standard states.

    Hope this helps.

    Joos
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • Dear Joos,

    the above info u provided helped. I was able to run the execution using standard life cycle states.

    I have a new issue now, with JOIN statements.

    I have 2 events: 'Bugcreated' (associated with IGSTK.csv) and 'Bugcompleted' (associated with IGSTK_1.csv).

    I added this join statement: IGSTK_1.csv AS igstk_a ON igstk_a.id = igstk.id
    in the 'Bugcreated' event.

    Now i am having following errors: pls advise
    ---
    2010-10-31 00:26:54: (DEBUG) We will insert a new attribute (concept:name with value igstk: 10842) into the cache DB for the Trace item with ID 10842
    2010-10-31 00:26:54: (DEBUG) We will insert a new attribute (IGSTK_details with value ) into the cache DB for the Trace item with ID 10842
    2010-10-31 00:26:54: (PROGRESS) Starting step 4 of 6: Extracting Event: Bugcreated.
    2010-10-31 00:26:54: (DEBUG) We are about to run the following query to extract the items for Event: Bugcreated:
    SELECT id AS [traceID], IGSTK1.csv AS igstka ON igstka.id = igstk.id AS [orderAttribute], 'igstk ' & igstk.id  AS [concept_instance], 'bug create' AS [concept_name], 'assign' AS [lifecycle_transition], igstk.Reporter AS [org_resource], igstk.DateSubmitted AS [time_timestamp] FROM IGSTK.csv AS igstk 
    2010-10-31 00:26:54: (ERROR) There is an error in the query (turn on 'debug' mode to see it) to fetch the information for Event: Bugcreated, this is the error we got: (-3504) [Microsoft][ODBC Text Driver] The SELECT statement includes a reserved word or an argument name that is misspelled or missing, or the punctuation is incorrect.
    2010-10-31 00:26:56: (PROGRESS) We stopped execution because of a critical error.

    ---
    Narayan.
  • JBuijsJBuijs Posts: 912
    Dear Narayan,

    As I see it you entered 'IGSTK1.csv AS igstka ON igstka.id = igstk.id' into the 'order attribute' property. This should be added in the 'link' property. The order by property should be set to the same value as the timestamp (datesubmitted).

    I don't think you need to join tables for this however. You can just use only the igstk.csv table.
    For one event definition you specify 'bugcreated' as its name and the corresponding column for the timestamp of this event.
    In the 'bugcompleted' event definition you specify the name and timestamp column of this event.
    In both definitions you use the igstk.csv table.

    Let me know if this works for you.
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • amradamrad Posts: 10
    Dear joos,

    the content of this link http://prom.win.tue.nl/research/wiki/_media/xesame/xesma_thesis_final.pdf is not availible now.

  • JBuijsJBuijs Posts: 912
    Hi Amrad,

    We shut down the prom.win.tue.nl server.
    Please look again at http://www.processmining.org/xesame/start

    See also http://www.win.tue.nl/promforum/discussion/171/prom.win.tue.nl-offline#Item_1
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • amradamrad Posts: 10
    Hi Joos,

    Thanks a lot.
    The two links you provided works. i have installed the Prom6 and it works very well, also, the examples of XESame. i will try to create a mapping for my project, until you send for me the mapping on the  columns i have ( No, Time, Source, Destination, Protocol, length and Info). these are the columns that we spoke in the email the last time.

    Thank you for you help and your collaboration.
    Amrad
  • JBuijsJBuijs Posts: 912
    Hi Amrad,

    I'm afraid that I can not help you with your mapping.
    If you read my thesis, Chapter 3 should provide you with an idea on how to specify the mapping.

    Good luck!
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • amradamrad Posts: 10
    Dear Joos,

    i have enjoying reading your thesis. And, i learned many things from it. i executed your examples and it works.
    After that, I have created a mapping for my csv file. but, i have some problems (there is some errors).
    So can you please, verify the execution and how to correct errors and what can be error in the mapping.
    In attachement, there is a zip file contain three files (New.csv, schema.ini, newmapping.mapping)

    thank you for your help.

  • JBuijsJBuijs Posts: 912
    Hi Amrad,

    I'm really busy (paper deadlines) so I can not test your mapping and correct it for you.
    If searching for the error message XESame provides you does not give you hints on how to solve it then you can post it here and I can give you advice.
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • amradamrad Posts: 10
    hi joos,

    thank you  for your reply.
    Here is the console outpout:

    2011-12-06 18:23:05: (PROGRESS) Starting step 1 of 5: initialization.
    2011-12-06 18:23:05: (PROGRESS) Starting step 2 of 5: Extracting log info.
    2011-12-06 18:23:05: (DEBUG) We are about to run the following query to extract the items for Log:
    SELECT 'reseau' AS [concept_name], 'standard' AS [lifecycle_model] FROM  
    2011-12-06 18:23:05: (ERROR) There is an error in the query (turn on 'debug' mode to see it) to fetch the information for Log
    2011-12-06 18:23:05: (ERROR) [Microsoft][ODBC Text Driver] Syntax error in FROM clause.
    2011-12-06 18:23:05: (WARNING) Cancelling execution! Just after running the query.
    2011-12-06 18:23:05: (NOTICE) Execution safely terminated.


    joos, I apologize for bothering you but it is a simple example.


  • amradamrad Posts: 10
    Hi joos,


    For more clarification, you will find attached the console outpout. i made some changes.

    thank you.

  • amradamrad Posts: 10
    Dear joos,

    i read your ansewrs posted to Narayan for the same error and i made some changes in my mapping but it is the same error. you will find attached the last console outpout.

    so, can you please look in the mapping and tell me how can i solve this error.

    thanks a lot.

    xesame.png 929.1K
    data.zip 395.5K
  • JBuijsJBuijs Posts: 912
    Hi Amrad,

    Good to see that you managed to solve the first error.

    The error you know get is caused by forgetting to define a traceID in the trace properties.
    The traceID should generate a unique number to identify each trace (in your case for instance the paquet detail no). This is used to connect the events to the traces.
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • amradamrad Posts: 10
    Hi Joos,

    Thanks a lot for your reply.

    i have identify the traceID in the trace properties  but it still the same error and the same message.

    Please, advice me what can i do in this case.

    thank you in advance.
  • JBuijsJBuijs Posts: 912
    Hi Amrad,

    Are you sure it is the exact same error?

    Are you sure the properties were save? (check this by going to another object and then back. You need to explicitly exit the field, only then are values saved).
    Are you sure you also set the traceID property for you event mappings?
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • amradamrad Posts: 10
    Hi joos,

    i'm sure that  is the same error. and all the proprities are saved. Also, i am sure that i seted the traceID for my event mappings.

    the error is not yet solved.
    thank you joos.
    amrad
  • JBuijsJBuijs Posts: 912
    Hi Amrad,

    Did you check if, for your database system, the other
    table fields are reserved words? You can try to surround them with []
    (as is automatically done for the AS [...] in the query.
    Words like 'time' and 'length' are, as the error message suggests, often reserved words.
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • amradamrad Posts: 10
    Hello Joos,

    First of all I wish you a happy new year 2012.
    I'm back in the forum with the same problem. I have checked the table fields and they are not a reserved words in my database system.
    Have you please other suggestions.

    thank you.

  • JBuijsJBuijs Posts: 912
    Hi Amrad,

    Happy new year to you too!

    Maybe it won't work but I noticed that the Paquet_details attribute has '' as value. Could you try to enter something there, such as 'bla'? It might be that the ODBC driver fails there.

    If this does not work, try to remove as much custom attributes as you can and check all remaining entries for errors.
    If the error then still persists, try to run in directly on the datasource, for instance by using SquirrelSQL (just google for the tool or search the forum here, I suggested it somewhere before).

    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • Dear Joos
    I am new in process mining, please could you advice me about a page to understand o to do a mapping file?
    Thanks in advance.
    Paul 
  • JBuijsJBuijs Posts: 912
    Dear Paul,

    Welcome to the forum!

    You can browse my masters thesis which can be found here:
    http://www.processmining.org/xesame/start

    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
Sign In or Register to comment.