To prevent spam users, you can only post on this forum after registration, which is by invitation. If you want to post on the forum, please send me a mail (h DOT m DOT w DOT verbeek AT tue DOT nl) and I'll send you an invitation in return for an account.

Could not parse NULL timestamp! error convert csv to xes

MunaMuna Posts: 23
I'm trying to convert a csv file to xes log. 
the file has 3 attributes : caseId, activity, ActivityStartTime, processsingtimeActivity

  • ActivityStarttime has this format (dd-MM-yyyy:HH:mm)
  • and processsingtimeActivity ( HH:mm:ss)
both ActivityStartand processsingtimeActivity sometimes are Null. Leading to an error in the conversion of an empty log and when I showed the error it is due to the null values.
I have no clue how to resolve this

Comments

  • hverbeekhverbeek Posts: 426
    Hi,

    As in another post on the forum you mention that you what to do performance analysis on this data, I suggest to provide non-null values for the null time stamps. Performance analysis requires the time stamps to be there.

    An simple way to do this would be to copy the time stamp from the previous row (and use some fixed date in case of the first row) if a time stamp is null.

    I would also replace the processingTimeActivity with the ActivityCompletetime (= ActivityStarttime + processingtimeActivity), as many of the analysis techniques use the start and complete times of the activity. If need be, the duration will be inferred by the techniques.

    Kind regards,
    Eric.
  • MunaMuna Posts: 23
    The data I have is from a real operating system. so for the null values, I have no solution but to keep them null in order to keep the real insights.

    For complete-time, as you said, I can not calculate the completion time for all activities because some of ActivityStarttime are null too . so for both columns ActivityStartTime and processing time some values are missing...).

    Yes, on another post   I said that I will do performance analysis ( but first I need to import the file and convert to an xlog thing that I'm still stuck at because of the null timetamp). And also I m intending to do a filter on attributes with null values before the performance analysis.

    I hope you can provide me with possible solutions for that problem as you know more on the subject


  • hverbeekhverbeek Posts: 426
    Hi,

    Can't you simple remove the rows with the null time stamps? They're kind of useless for performance analysis anyway...

    The only alternative I see is to work with time intervals for time stamps: This activity started not before time stamp A and not later than time stamp B. For a null time stamp, you could then copy the A time stamp from the previous row, and for the B time stamp the B time stamp from the next row. If a time stamp is known (not null), set A and B both to that time stamp. But this makes performance analysis not more simple, as yo uwoudl be dealing with intervals instead of points in time. I don't think there are any plugins available that do this.

    What else can one do if required data to answer your questions is missing?

    Kind regards,
    Eric.
  • MunaMuna Posts: 23
    okay I will try to remove rows with null timestamps and get back to you thanks for your time.
  • MunaMuna Posts: 23
    I tried to remove the missing values. as a consequence, we end up with a meaningless file here is how the file looks like  :  (It consist of an operating system of a port. )

    caseId activity ActivityStart processingtimeActivity
    01 Validation null 00:01:46
    01 Entry null null
    01 Security check null null
    01 Regulation  null null
    01                 Entry T2 null null
    01 Entry terminal export  01/04/2019  07:5000:00:00
    01Waiting Export 01/04/2019  10:50 02:59:37
    01 Boarding null null
    02 Validation null 00:02:45
    02
    Entry
    10/04/2019  18:09
    02
    Security check
    null
    02
    Regulation  00:34:43
    02
    Entry T2 10/04/2019  19:36 01:26:53
    02
    Entry terminal export
    10/04/2019  19:36 00:00:03
    02
    Waiting Export
    10/04/2019  21:3501:59:02
    02
    Boarding



    I really didn't get the second solution you proposed. but if we copy timestamps from the previous row or the next row. The file becomes no more real nor illustrative of the real use case.


    NB : the null mentionned in the example are actually empty in the csv file

  • hverbeekhverbeek Posts: 426
    Hi,

    To start from the beginning: What questions would you like to have answered from the data as shown above?

    Only some activities always seem to have the start time stamp and the duration. As a result, performance analysis must be restricted to these activities.

    In the data, some cells have "null" while some others are empty. Is there a difference?

    You can still do non-performance analysis, in which case you can simply ignore the time stamps and the duration, or use them where you see fit. As an example of the latter, consider the case 2 in your example. Both "Entry T2" and "Entry terminal export" start at the same time, but the one that starts first ("Entry T2") takes more than an hour. This suggest that both activities run concurrently. After some preprocessing, you could replace both rows by:

    02    Entry T2+start     10/04/2019  19:36   
    02    Entry terminal export+start    10/04/2019  19:36    
    02    Entry terminal export+complete    10/04/2019    19:36
    02    Entry T2+complete    10/04/2019    20:53

    That is, if possible, split an activity into a start and complete activity, compute the time stamp of the complete activity, and sort the events in the trace accordingly.

    However, no plugins are available for this preprocessing, as this is very specific to the actual data you are having.

    Kind regards,
    Eric.
  • MunaMuna Posts: 23
    Hi Eric,

    I don't have any specific questions to answer yet other than a global idea of what I want to do . I want to extract insights from the data other than the control-flow perspective that discovers the process model .

    Any suggestions !!  are very welcomed. maybe directions I can follow, pluggins that are gonna be usful for my case..?

    In the data, some cells have "null" while some others are empty. Is there a difference? That was to explain to you that where I mentionned null , actually is empty in the  csv file. that is it.

    Now I'm going to remove all the rows having empty data. to end up with a very specific without empty values I guess that's gonna be a good start dont you think ?

    If any issues faces, I 'll come back to the forum again.




Sign In or Register to comment.