IEEE CIS Task Force on Process Mining

Process Discovery Contest @ BPM 2016

Background

Process Mining is a relatively young research discipline that sits between computational intelligence and data mining on the one hand, and process modeling and analysis on the other hand. The idea of process mining is to discover, monitor and improve real processes (i.e., not assumed processes) by extracting knowledge from event logs readily available in today's (information) systems. These techniques provide new means to discover, monitor, and improve processes in a variety of application domains. There are two main drivers for the growing interest in process mining. On the one hand, more and more events are being recorded, thus, providing detailed information about the history of processes. On the other hand, there is a need to improve and support business processes in competitive and rapidly changing environments. The lion’s share of attention of Process Mining has been devoted to Process Discovery, namely extracting process models - mainly business process models - from an event log.

The IEEE CIS Task Force on Process Mining aims to promote the research in the field of process mining and its application in real settings. In collaboration with it, to foster the research in the area of process discovery, we are proud to introduce the 1st Process-Discovery contest, which will be collocated with the BPM-2016 Conference in Rio de Janeiro in September 2016.

Objectives and Context

The Process Discovery Contest is dedicated to the assessment of tools and techniques that discover business process models from event logs. The objective is to compare the efficiency of techniques to discover process models that provide a proper balance between “overfitting” and “underfitting”. A process model is overfitting (the event log) if it is too restrictive, disallowing behavior which is part of the underlying process. This typically occurs when the model only allows for the behavior recorded in the event log. Conversely, it is underfitting (the reality) if it is not restrictive enough, allowing behavior which is not part of the underlying process. This typically occurs if it overgeneralizes the example behavior in the event log. A number of event logs will be provided. These event logs are generated from business process models that show different behavioral characteristics. The process models will be kept secret: only “training” event logs showing a portion of the possible behavior will be disclosed. The winner is/are the contestant(s) that provides the technique that can discover process models that are the closest to the original process models, in term of balancing between “overfitting” and “underfitting”. To assess this balance we take a classification perspective where a “test” event log will be used. The test event log contains traces representing real process behavior and traces representing behavior not related to the process. Each trace of the training and test log will record complete executions of instances of the business processes. In other words, each trace records all events of one process instance from the starting state till the end state.

A model is as good in balancing “overfitting” and “underfitting” as it is able to correctly classify the traces in the “test” event log:

  • Given a trace representing real process behavior, the model should classify it as allowed.
  • Given a trace representing a behavior not related to the process, the model should classify it as disallowed.

With a classification view, the winner is/are the contestant(s) who can classify correct the largest number of traces in all the test event logs. All event logs will have the same weight.

Additionally, CPU time may be used to untie tools that perform identical in the aforementioned criteria. There is also a limit to 4 GB for what concerns the amount of RAM memory that can be employed. The contest is not restricted to any modelling notation and no preference is made. Any procedural (e.g., Petri Net or BPMN) or declarative (e.g., Declare) notation is equally welcome. The context is not restricted to open-source tools. Proprietary tools can also participate.

The results and the winner will be announced on September, 18th, 2016 during the BPI workshop that is co-located with the BPM 2016 conference, Rio de Janeiro, Brazil. The winner will be given a chance to present the approach/technique that has lead to the victory.

A trophy will be awarded during the conference dinner of the BPM 2016 conference

A contestant can be a single individual or a group that belongs to any institution, public or private. A prize will be awarded to the winning contestant. Take a look at the Prize section below.

Positioning of the Process Discovery Contest

The only other contest related to process mining is the annual Business Processing Intelligence Challenge (BPIC). The BPIC uses real-life data without objective evaluation criteria: It is about the perceived value of the analysis and is not limited to the discovery task (also conformance checking, performance analysis, etc.). The report is evaluated by a jury. The Process Discovery Contest is different. The focus is on process discovery. Synthetic data are used to have an objectified “proper” answer. Process discovery is turned into a classification task with a training set and a test set. A process model needs to decide whether traces are fitting or not.

Organizers

  • Josep Carmona, Universitat Politècnica de Catalunya (UPC), Spain
  • Massimiliano de Leoni, Eindhoven University of Technology, The Netherlands
  • Benoît Depaire, Hasselt University, Belgium.
  • Toon Jouck, Hasselt University, Belgium.

Key Dates

  • 15 June 2016 3 July 2016: Submission of one process model for each of 10 event logs used in the context.
  • 20 June 2016 13 July 2016: On this web site, the following items are going to be published: a) 10 test logs, each of which containing 20 traces, that are used to score the submissions; b) 10 reference process models in BPMN that have been to generate the event logs.
  • 18-19 September 2016: The winner will be announced and a prize will be given.

To provide support, in any moment, contestants can contact the organizers expressing the intention of submitting. To all contestants who expressed their intention, the organizers will send two test event logs for each of the 10 process models on 15 April 2016 and 15 May 2016, respectively. Each of these event logs will be characterized by having 10 traces that can be replayed and 10 traces that cannot on the respective process model. However, no information will be given about which of the traces can or cannot be replayed. The contestants can submit their classification attempt to the organizers, which reply by stating how many traces have been correctly classified. The two feedback loops can be used as a mean to assess the effectiveness of the discovery algorithms.

The complete call for submissions provides additional information (please note the document is not updated wrt. the new deadline and the prizes). Also, the appendix of the call discusses the behavioral characteristics of the process models from which the provided event logs were generated.

Where and how to submit

Not later than 3 July 2016, each contestant needs to submit a document that at least contains the following sections:

  • One section that discusses the replaying semantics of the process modelling notation that has been employed. In other words, the section needs to discuss how, given any process trace t and any process model m in that notation, it can be unambiguously determined whether or not trace t can be replayed on model m. As an alternative to this section, the contestant can provide a link to a paper or any other document where the replaying semantics is described.
  • One section that contains the pictures of the 10 process models that have been discovered from the 10 event logs.
  • One section that provides a link where one can download the tool(s) used to discover the process models as well as the step-by-step guide to generate one of the process models. In case the tool is not open-source, a license needs to be provided, which needs be valid at least until 30 September 2016. The license will only be used by the organizers to evaluate the submission.

No specific format is requested for this document.

Contestants submit the document by sending an email with subject “Process Discovery Contest - Submission” to discoverycontest@tue.nl. The same email should also be used for those who want to express their intention to submit.

For additional information, click here to download the complete call for submissions (please note the document is not updated wrt. the new deadline and the prizes)

Prizes and Winner

 Prizing cerimony during the social dinner of the BPM-2016 conference

The results and the winner were announced on September, 18th, 2016 during the BPI workshop that is co-located with the BPM 2016 conference, Rio de Janeiro, Brazil. The winner was given a chance to present the approach/technique during the BPI workshop.

The challenge is sponsored by Celonis, which was kindly covering the expenses of one flight to Rio de Janeiro and 4 nights in hotel to be present at the prizing cerimony.

The winner group for 2016 was: H.M.W. Verbeek and F. Mannhardt (Eindhoven University of Technology), The DrFurby Classifier.

The detail of the techniques employed by the winner group are reported in:

H.M.W. Verbeek, F. Mannhardt, The DrFurby Classifier submission to the Process Discovery Contest @ BPM 2016. BPM Center Report BPM-16-08, BPMCenter.org, 2016

 

Event-log Datasets

Training Event Log

The following zip file includes the 10 event logs for the 10 process models in consideration:

This zip file contains 20 files: Each event log is provided in both CSV and XES formats. For more information about the XES format, please refer to http://www.processmining.org/openxes/start.

Each CSV file is a different event log. Each CSV-file row represents a different event and consists of two (comma-separated) values in the following order:

  1. The name of the process-model activity to which the event refers;
  2. The identifier of the case to which the event belongs

It is worth highlighting that the events do not have a timestamp associated. Contestants are allowed to add timestamps to the event logs if the specific algorithm being analyzed would require them.

Test Event Log

The following zip file includes the 10 event logs for the 10 process models in consideration:

This zip file contains 20 files: Each event log is provided in both CSV and XES formats.

Each event log contains 10 traces that are fitting according to the reference models (see below) and 10 traces that are not fitting.

Process Models

The following zip file includes the 10 BPMN process models that were used in the contest:

The format of the files are compliant with the OMG standard. They can be loaded in ProM, the process editor of Signavio and many other process-modelling tools.

Celonis is an innovative software vendor fully dedicated to develop the leading process mining technology for the enterprise. The vision at Celonis is easy: to make the world more efficient. Celonis strongly believes that every single company has the potential for outstanding performance and we want to support them by optimizing their processes. By using Celonis Process Mining, companies have full transparency and therefore full control over every single process. Celonis believes Process Mining is one of the important data analytics technologies of the future and, therefore, Celonis is more than happy to support the current and future brains of this industry: “Being the sponsor of the discovery contest was an affair of the heart for us and we wish every participant good luck and hope to see more of their work in the future”