This graph shows the average accuracy values over three different runs. We used three different runs as some discovery algorithms are non-deterministic.This graph shows the F-score values over three different runs. We used three different runs as some discovery algorithms are non-deterministic.
PDC 2020 proposal
There are two contests: an automated contest, followed by a manual contest.
What to submit?
You should submit a working discovery algorithm, which can be called using a “Discover.bat” Windows batch file which takes two parameters:
- The full path to the discover log file, including the “.xes” extension.
- The full path to the model file where the discovered model should be stored, excluding any extension like “.pnml”, “.bpmn”, or “.lsk”.
Running the batch file should import the provided log file, discover a model from it, and export the discovered model to the provided model file (including the extension that matches the model).
If the results of calling your Discovery.bat file as described above is a PNML file (Petri nets), a BPMN file (BPMN diagram), or a LSK file (log skeleton), then you’re done. If not, the discovery algorithm needs to come with its own working classifier algorithm, that is, a “Classify.bat” Windows batch file, which takes three parameters:
- The full path to the classify log file, including the “.xes” extension.
- The full path to the model file which should be used to classify the classify log, excluding any extension like “.pnml”, “.bpmn”, or “.lsk”.
- The full path to the log file where the classified log should be stored, including the “.xes” extension.
Running the batch file should import the classify log and the model, classify the classify log using the model (adding the “pdc:isPos” attributes to the traces), and export the classified log to the provided log file.
Classification of a trace is done by adding the boolean “pdc:isPos” attribute to the trace, which should be true if the trace is classified positive (fits your model) and false if the trace is classified negative (does not fit your model).
The 8 pre-existing discovery algorithms do not count as submissions, and are hence not participating in this contest. They are just there to show to current state of the discovery field. If the authors of these algorithms want to participate, they should explicitly submit the algorithm.
When to submit?
As soon as possible, but not later than August 17st. You can submit as many times as you like. Note that a new submission does not replace an old submission. Every submission counts.
How to submit?
Please provide us (by email) with a link to an archive containing your submission. Note that the archive should be self-contained: If unpacked, the algorithm(s) should be able to run without using any other software.
What feedback do I get?
The F-score results obtained using your algorithm(s). Like follows:
The submission that results in the highest average F-score over all 192 classified logs in the data set. This F-score is computed over the accuracy over the positive (fitting) traces and the accuracy over the negative (non-fitting) traces. In every classify log, there are about as many positive as negative cases. Note that the 192 logs are not disclosed for this contest.
What to submit?
A classification of the most complex classify log from the most complex discover log in data set (“pdc_2020_1211111.xes”). This classification consists of the classified log as described above: For every trace in the classify log, the attribute “pdc:isPos” is added to indicate whether this trace is classified as positive (true) or negative (false) by your model.
You may use any way you see fit to create this classified log.
When to submit?
After August 17th, 2020 (these two most complex logs will be disclosed on August 18th, 2020) but not later than August 31st, 2020.
How to submit?
Please send us (by email) the classified log. You can submit as many times as you like, but a new submission does replace an old submission. Only your latest submission counts.
What feedback do I get?
The submission that results in the highest average F-score over this most complex log in the data set. This F-score is computed over the accuracy over the positive (fitting) traces and the accuracy over the negative (non-fitting) traces.
This contest takes place after the deadline for submission to the automated contest, as it requires the most complex log of the PDC 2020 data set (pdc_2020_1211111.xes) to be disclosed. This log is disclosed so that one can submit a (manual) classification for it within, say, one or two weeks time.
For this contest, the winner is the one that scores best on this most complex log of the PDC 2020 data set. At the moment, the score to beat is 74% by the Log Skeleton (5% noise).
By comparing the results from the manual contest with the results for this most complex log from the automated contest, we can get an idea how big the gap-to-bridge is for new discovery algorithms.
PDC 2021 and onwards
As submission for the automated contest include an implemented discovery algorithm, we can collect these implementations and use them as reference for later contests. For later contests, we could simply add all submitted discovery algorithms to the collection of default discovery algorithms. This way, we may be able to build a collection of discovery algorithms that we can compare on any event log.
In case a new contest comes with a new dimension (like data-aware vs. non-data-aware), then we can run all existing discovery algorithms on the corresponding logs. This way, we know how good the existing algorithms can handle this new dimension.
The accuracy issue
The main downside I see for this proposal is that it (again) relies on the accuracy measure for checking the quality of the discovery. I agree that this is a downside, but I do not see real alternatives for this. Even if we restrict the process models to be Petri nets, how to measure the quality of the discovery? Petri nets that have a quite different structure may have a quite similar behavior, while Petri nets that have a quite similar structure may have a quite different behavior. How then should we check quality? Therefore, unless somebody comes with a workable alternative how we can check the quality of a discovery, accuracy it is (IMHO).
The F-score used is the F-score over the accuracy of the positive (fitting) traces and the accuracy of the negative (non-fitting) traces. As a result, if a model classifies all traces as negative, the F-score will be 0%.
The non-determinism we encountered in some miners is punished by using the minimal (positive and negative) accuracy values over the different runs. As an example, if three runs were made, and if for the same event log containing 500 positive traces 495, 490, and 500 traces were classified as positive, then the positive accuracy is 490/500 = 98%.