Process Discovery Contest @ BPM 2016

Background

Process Mining is a relatively young research discipline that sits between computational intelligence and data mining on the one hand, and process modeling and analysis on the other hand. The idea of process mining is to discover, monitor and improve real processes (i.e., not assumed processes) by extracting knowledge from event logs readily available in today's (information) systems. These techniques provide new means to discover, monitor, and improve processes in a variety of application domains. There are two main drivers for the growing interest in process mining. On the one hand, more and more events are being recorded, thus, providing detailed information about the history of processes. On the other hand, there is a need to improve and support business processes in competitive and rapidly changing environments. The lion’s share of attention of Process Mining has been devoted to Process Discovery, namely extracting process models - mainly business process models - from an event log.

The IEEE CIS Task Force on Process Mining aims to promote the research in the field of process mining and its application in real settings. In collaboration with it, to foster the research in the area of process discovery, we are proud to introduce the 1st Process-Discovery contest, which will be collocated with the BPM-2016 Conference in Rio de Janeiro in September 2016.

Objectives and Context

The Process Discovery Contest is dedicated to the assessment of tools and techniques that discover business process models from event logs. The objective is to compare the efficiency of techniques to discover process models that provide a proper balance between “overfitting” and “underfitting”. A process model is overfitting (the event log) if it is too restrictive, disallowing behavior which is part of the underlying process. This typically occurs when the model only allows for the behavior recorded in the event log. Conversely, it is underfitting (the reality) if it is not restrictive enough, allowing behavior which is not part of the underlying process. This typically occurs if it overgeneralizes the example behavior in the event log. A number of event logs will be provided. These event logs are generated from business process models that show different behavioral characteristics. The process models will be kept secret: only “training” event logs showing a portion of the possible behavior will be disclosed. The winner is/are the contestant(s) that provides the technique that can discover process models that are the closest to the original process models, in term of balancing between “overfitting” and “underfitting”. To assess this balance we take a classification perspective where a “test” event log will be used. The test event log contains traces representing real process behavior and traces representing behavior not related to the process. Each trace of the training and test log will record complete executions of instances of the business processes. In other words, each trace records all events of one process instance from the starting state till the end state.

A model is as good in balancing “overfitting” and “underfitting” as it is able to correctly classify the traces in the “test” event log:

With a classification view, the winner is/are the contestant(s) who can classify correct the largest number of traces in all the test event logs. All event logs will have the same weight.

Additionally, CPU time may be used to untie tools that perform identical in the aforementioned criteria. There is also a limit to 4 GB for what concerns the amount of RAM memory that can be employed. The contest is not restricted to any modelling notation and no preference is made. Any procedural (e.g., Petri Net or BPMN) or declarative (e.g., Declare) notation is equally welcome. The context is not restricted to open-source tools. Proprietary tools can also participate.

The results and the winner will be announced on September, 18th, 2016 during the BPI workshop that is co-located with the BPM 2016 conference, Rio de Janeiro, Brazil. The winner will be given a chance to present the approach/technique that has lead to the victory.

A trophy will be awarded during the conference dinner of the BPM 2016 conference

A contestant can be a single individual or a group that belongs to any institution, public or private. A prize will be awarded to the winning contestant. Take a look at the Prize section below.

Positioning of the Process Discovery Contest

The only other contest related to process mining is the annual Business Processing Intelligence Challenge (BPIC). The BPIC uses real-life data without objective evaluation criteria: It is about the perceived value of the analysis and is not limited to the discovery task (also conformance checking, performance analysis, etc.). The report is evaluated by a jury. The Process Discovery Contest is different. The focus is on process discovery. Synthetic data are used to have an objectified “proper” answer. Process discovery is turned into a classification task with a training set and a test set. A process model needs to decide whether traces are fitting or not.

Organizers

Key Dates

To provide support, in any moment, contestants can contact the organizers expressing the intention of submitting. To all contestants who expressed their intention, the organizers will send two test event logs for each of the 10 process models on 15 April 2016 and 15 May 2016, respectively. Each of these event logs will be characterized by having 10 traces that can be replayed and 10 traces that cannot on the respective process model. However, no information will be given about which of the traces can or cannot be replayed. The contestants can submit their classification attempt to the organizers, which reply by stating how many traces have been correctly classified. The two feedback loops can be used as a mean to assess the effectiveness of the discovery algorithms.

The complete call for submissions provides additional information (please note the document is not updated wrt. the new deadline and the prizes). Also, the appendix of the call discusses the behavioral characteristics of the process models from which the provided event logs were generated.

Where and how to submit

Not later than 3 July 2016, each contestant needs to submit a document that at least contains the following sections:

No specific format is requested for this document.

Contestants submit the document by sending an email with subject “Process Discovery Contest - Submission” to discoverycontest@tue.nl. The same email should also be used for those who want to express their intention to submit.

For additional information, click here to download the complete call for submissions (please note the document is not updated wrt. the new deadline and the prizes)

Prizes and Winner

 Prizing cerimony during the social dinner of the BPM-2016 conference

The results and the winner were announced on September, 18th, 2016 during the BPI workshop that is co-located with the BPM 2016 conference, Rio de Janeiro, Brazil. The winner was given a chance to present the approach/technique during the BPI workshop.

The challenge is sponsored by Celonis, which was kindly covering the expenses of one flight to Rio de Janeiro and 4 nights in hotel to be present at the prizing cerimony.

The winner group for 2016 was: H.M.W. Verbeek and F. Mannhardt (Eindhoven University of Technology), The DrFurby Classifier.

The detail of the techniques employed by the winner group are reported in:

H.M.W. Verbeek, F. Mannhardt, The DrFurby Classifier submission to the Process Discovery Contest @ BPM 2016. BPM Center Report BPM-16-08, BPMCenter.org, 2016

 

Event-log Datasets

Training Event Log

The following zip file includes the 10 event logs for the 10 process models in consideration:

This zip file contains 20 files: Each event log is provided in both CSV and XES formats. For more information about the XES format, please refer to http://www.processmining.org/openxes/start.

Each CSV file is a different event log. Each CSV-file row represents a different event and consists of two (comma-separated) values in the following order:

  1. The name of the process-model activity to which the event refers;
  2. The identifier of the case to which the event belongs

It is worth highlighting that the events do not have a timestamp associated. Contestants are allowed to add timestamps to the event logs if the specific algorithm being analyzed would require them.

Test Event Log

The following zip file includes the 10 event logs for the 10 process models in consideration:

This zip file contains 20 files: Each event log is provided in both CSV and XES formats.

Each event log contains 10 traces that are fitting according to the reference models (see below) and 10 traces that are not fitting.

Process Models

The following zip file includes the 10 BPMN process models that were used in the contest:

The format of the files are compliant with the OMG standard. They can be loaded in ProM, the process editor of Signavio and many other process-modelling tools.

Celonis is an innovative software vendor fully dedicated to develop the leading process mining technology for the enterprise. The vision at Celonis is easy: to make the world more efficient. Celonis strongly believes that every single company has the potential for outstanding performance and we want to support them by optimizing their processes. By using Celonis Process Mining, companies have full transparency and therefore full control over every single process. Celonis believes Process Mining is one of the important data analytics technologies of the future and, therefore, Celonis is more than happy to support the current and future brains of this industry: “Being the sponsor of the discovery contest was an affair of the heart for us and we wish every participant good luck and hope to see more of their work in the future”