14th International Workshop on Business Process Intelligence 2018

to be held in conjunction with BPM 2018 Sydney, Australia, September 9 - 14, 2018


|

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
2018:challenge [2018/02/07 13:33]
bfvdonge [The Data]
2018:challenge [2018/09/18 12:21] (current)
bfvdonge [The Challenge]
Line 36: Line 36:
 We strongly encourage people to use any tools, techniques, methods at their disposal. There is no need to restrict to open-source tools, and proprietary tools as well as techniques developed or implemented specifically for this challenge are welcome. We strongly encourage people to use any tools, techniques, methods at their disposal. There is no need to restrict to open-source tools, and proprietary tools as well as techniques developed or implemented specifically for this challenge are welcome.
  
-Our industrial sponsors provide access to their tools for use with the BPI Challenge dataset. If you would like to use Celonis on this data, please contact them directly on [[BPI2018@celonis.com|BPI2018@celonis.com]]. If you would like to try minit on this dataset, please contact minit on [[BPI2018@minitlabs.com|BPI2018@minitlabs.com]].+Our industrial sponsors provide access to their tools for use with the BPI Challenge dataset. If you would like to use Celonis on this data, please contact them directly on [[BPI2018@celonis.com|BPI2018@celonis.com]]. If you would like to try minit on this dataset, please contact minit on [[BPI2018@minit.io|BPI2018@minit.io]].
  
 ===== Important Dates ===== ===== Important Dates =====
Line 78: Line 78:
 We expect participants can focus on a //specific aspect// of interest and analyze this aspect in great detail. Here, one can choose for example to focus on specific models, such as control-flow models, social network models, performance models, predictive models, etc. We expect participants can focus on a //specific aspect// of interest and analyze this aspect in great detail. Here, one can choose for example to focus on specific models, such as control-flow models, social network models, performance models, predictive models, etc.
  
 +The winner: **Jarno Brils, Nina van den Elsen, Jan de Priester and Tom Slooff** of the **[[https://​educationguide.tue.nl/​programs/​tue-honors-academy/​|Honors Academy of Eindhoven University of Techology]]** with their report entitled ​
 +//​{{:​2018:​bpi2018_paper_15.pdf|Analysis and Prediction of Undesired Outcomes}}//​
  
 === The Academic Category === === The Academic Category ===
 This category targets academics. The focus in this category is much more on the novelty of the techniques applied than the actual results. This provides a great opportunity for BPI researchers to show the practical applicability of their tools and/or techniques on real-life data. This category targets academics. The focus in this category is much more on the novelty of the techniques applied than the actual results. This provides a great opportunity for BPI researchers to show the practical applicability of their tools and/or techniques on real-life data.
 +
 +The winner: **Stephen Pauwels and Toon Calders** of the **[[https://​www.uantwerpen.be/​nl/​|University of Antwerp]]** with their report entitled ​
 +//​{{:​2018:​bpi2018_paper_10.pdf|Detecting and Explaining Drifts in Yearly Grant Applications}}//​
  
 === The Professional Category === === The Professional Category ===
 This category targets professionals to show their skills in analyzing business processes. The submitted reports are judged on their level of professionalism. The participants are expected to report on a broader range of aspects, where each aspect does not have to be developed in full detail. The report submitted in this category will be judged on its //​completeness of analysis// and usefulness for the purpose of a real-life business improvement setting. This category targets professionals to show their skills in analyzing business processes. The submitted reports are judged on their level of professionalism. The participants are expected to report on a broader range of aspects, where each aspect does not have to be developed in full detail. The report submitted in this category will be judged on its //​completeness of analysis// and usefulness for the purpose of a real-life business improvement setting.
  
-A jury will decide which report ​is best and the winning participant ​in each category will be invited to come to Sydney, Australia ​to receive a prize and to present their findings+The winner: **Lalit Wangikar, Sumit Dhuwalia, Abhilasha Yadav, Bhavy Dikshit and Dikshant Yadav** from **[[https://​www.cognitioanalytics.com/​|Cognitio Analytics]]** with their report ​entitled  
 +//​{{:​2018:​bpi2018_paper_20.pdf|Faster Payments to Farmers: Analysis of the Direct Payments Process of EU's Agricultural Guarantee Fund}}// 
 + 
 +The winners were selected by a jury and the winners presented their findings at the workshop ​in Sydney, Australia! ​ 
  
-We strongly encourage people to use any tools, techniques, methods at their disposal. There is no need to restrict to open-source tools, and proprietary tools as well as techniques developed or implemented specifically for this challenge are welcome. Both sponsors make their tools available for use with the BPI challenge data. Information on how to contact them will follow soon. If you want to use [[http://​www.promtools.org/​|ProM]] with this data, please make sure to use [[http://​www.promtools.org/​prom6/​downloads/​prom-lite-1.2-jre7-installer.exe|ProM Lite 1.2]] or later or a Nightly build. The data does not load correctly in ProM Lite 1.1 and earlier. 
  
  
Line 109: Line 117:
 In total, the event log contains 2,514,266 events for 43,809 applications over a period of three years. The shortest case contains 24 events, the longest 2973 and on average there are 57 events per case referring to 14 activities. As mentioned, the data is centered around documents and for your convenience,​ we provide both the complete log file as well as log files for each document type, in which each instance of a document is a case. We expect to publish the data in the 4TU datacenter soon! In total, the event log contains 2,514,266 events for 43,809 applications over a period of three years. The shortest case contains 24 events, the longest 2973 and on average there are 57 events per case referring to 14 activities. As mentioned, the data is centered around documents and for your convenience,​ we provide both the complete log file as well as log files for each document type, in which each instance of a document is a case. We expect to publish the data in the 4TU datacenter soon!
  
-There are seven different document types in the data. However, one document changed between ​2015 and 2016 (and then again in 2017)hence the data shows the following eight document ​types:+There are nine different document types in the data listed in the table belowFrom 2015 to 2016, the Parcel document was succeeded by the Geo Parcel Document. In 2017, the Geo Parcel document also replaced ​the Department Control Parcels ​document.
  
 <​html><​table>​ <​html><​table>​
Line 120: Line 128:
 <​td ​ style="​vertical-align:​top;​text-align:​top">​A document containing the summarized results of various checks (reference alignment, department control, inspections) ​ <​td ​ style="​vertical-align:​top;​text-align:​top">​A document containing the summarized results of various checks (reference alignment, department control, inspections) ​
 </tr> </tr>
-<tr bgcolor="#​F0F0D2"><​td ​ style="​vertical-align:​top;​text-align:​top">​Department control parcels </td>+<tr bgcolor="#​F0F0D2"><​td ​ style="​vertical-align:​top;​text-align:​top">​Department control parcels ​(before 2017)</td>
 <​td ​ style="​vertical-align:​top;​text-align:​top">​Main</​td>​ <​td ​ style="​vertical-align:​top;​text-align:​top">​Main</​td>​
 <​td ​ style="​vertical-align:​top;​text-align:​top">​A document containing the results of checks regarding the validity of parcels of a single applicant ​ <​td ​ style="​vertical-align:​top;​text-align:​top">​A document containing the results of checks regarding the validity of parcels of a single applicant ​
Line 136: Line 144:
 <​td ​ style="​vertical-align:​top;​text-align:​top">​The document containing all parcels for which subsidies are requested ​ <​td ​ style="​vertical-align:​top;​text-align:​top">​The document containing all parcels for which subsidies are requested ​
 </tr> </tr>
-<tr bgcolor="#​F0F0D2"><​td ​ style="​vertical-align:​top;​text-align:​top">​Geo Parcel Document (replaces Parcel document since 2016)</​td>​+<tr bgcolor="#​F0F0D2"><​td ​ style="​vertical-align:​top;​text-align:​top">​Geo Parcel Document (replaces Parcel document since 2016 and Department control parcels since 2017)</​td>​
 <​td ​ style="​vertical-align:​top;​text-align:​top">​Main<​br/>​Declared<​br/>​Reported</​td>​ <​td ​ style="​vertical-align:​top;​text-align:​top">​Main<​br/>​Declared<​br/>​Reported</​td>​
 <​td ​ style="​vertical-align:​top;​text-align:​top">​The document containing all parcels for which subsidies are requested. From 2017, the Geo Parcel Document also replaces the Department control parcels document. <​td ​ style="​vertical-align:​top;​text-align:​top">​The document containing all parcels for which subsidies are requested. From 2017, the Geo Parcel Document also replaces the Department control parcels document.
Line 153: Line 161:
  
 ===== Download ===== ===== Download =====
-The data will be made available through the 4TU Center for research data as usual. However, for your convenience,​ we have the data ready for download right now: +The data is made available through the [[https://​doi.org/​10.4121/​uuid:​3301445f-95e8-4ff0-98a4-901f1f204972|4TU Center for research data]] as usual . However, for your convenience,​ we have the data ready for download right now: 
-  * [[https://www.dropbox.com/s/es1bazlo05h4m3z/application.xes.gz?​dl=0|Application log (xes.gz, 150MB)]] This log contains all event data for three years with application as a case ID,+  * [[https://data.4tu.nl/repository/uuid:​3301445f-95e8-4ff0-98a4-901f1f204972/DATA1|Application log (xes.gz, 150MB)]] This log contains all event data for three years with application as a case ID,
   * [[https://​www.dropbox.com/​s/​9awslbvm9uz9mnu/​document_logs.zip?​dl=0|Document logs (zip, 150MB)]] This collection contains eight log files, one for each document type. In each file, only those events relevant for a document are included.   * [[https://​www.dropbox.com/​s/​9awslbvm9uz9mnu/​document_logs.zip?​dl=0|Document logs (zip, 150MB)]] This collection contains eight log files, one for each document type. In each file, only those events relevant for a document are included.
  
-Please note that these links and files are temporary. The final logs will be published through 4TUThey are expected to be identical to these logs.+When you use this data, please site this as "<​html>​van Dongen, B.F. (Boudewijn);​ Borchert, F. (Florian) (2018) BPI Challenge 2018. Eindhoven University of Technology. Dataset. <a href="​https://​doi.org/​10.4121/​uuid:​3301445f-95e8-4ff0-98a4-901f1f204972">​https://​doi.org/​10.4121/​uuid:​3301445f-95e8-4ff0-98a4-901f1f204972</​a></​html>"​. The Bibtex or other formats can be downloaded from [[https://​doi.org/​10.4121/​uuid:​3301445f-95e8-4ff0-98a4-901f1f204972/​object/​citation]].
 ==== Trace attributes ==== ==== Trace attributes ====
  
Line 391: Line 399:
   * Undesired outcome 1: The payment is late. A payment can be considered timely, if there has been a "begin payment"​ activity by the end of the year that was not eventually followed by "abort payment"​.  ​   * Undesired outcome 1: The payment is late. A payment can be considered timely, if there has been a "begin payment"​ activity by the end of the year that was not eventually followed by "abort payment"​.  ​
  
-  * Undesired outcome 2: The case needs to be reopened, either by the department (“change by dep.”) or due to a legal objection by the applicant (“objection”). This may result in additional payments or reimbursements (“payment_actual{x}“ > 0, where x <​html>&​ge;</​html>​ 1 refers to the xth payment after the initial one) +  * Undesired outcome 2: The case needs to be reopened, either by the department (subprocess "​Change"​) or due to a legal objection by the applicant (subprocess ​Objection”). This may result in additional payments or reimbursements (“payment_actual{x}“ > 0, where x <​html>&​ge;</​html>​ 1 refers to the xth payment after the initial one) 
  
-**Question**:​ We would like to detect such cases as early as possible. Ideally, this should happen before a decision is made for this case (first occurrence of “DP application decide”). You may use data from previous years to make predictions for the current year. +**Question**:​ We would like to detect such cases as early as possible. Ideally, this should happen before a decision is made for this case (first occurrence of “Payment ​application+application+decide”). You may use data from previous years to make predictions for the current year. 
  
 ==== Prediction of penalties (risk assessment) ==== ==== Prediction of penalties (risk assessment) ====
Line 400: Line 408:
 This may occur to a variety of reason, i.e., the stated size of the farmland did not match the actual size as determined by alignment with the reference or a remote or on-site inspection. Other reasons include the violation of cross-compliance rules or noncompliance with the young farmer condition. This may occur to a variety of reason, i.e., the stated size of the farmland did not match the actual size as determined by alignment with the reference or a remote or on-site inspection. Other reasons include the violation of cross-compliance rules or noncompliance with the young farmer condition.
  
-The occurrence of such a penalty is indicated by the cut amount (“penalty_amount{x}”) and a code for one or more reasons (“penalty_{xxx}”). Some of these are considered more severe (namely: B3, B4, B5, B6, B16, BGK, C16, JLP3, V5). +The occurrence of such a penalty is indicated by the cut amount (“penalty_amount{x}”) and a code for one or more reasons (“penalty_{xxx}”). Some of these are considered more severe (namely: B3, B4, B5, B6, B16, BGK, C16, JLP3, V5 and BGP, BGKV, B5F in Q2). 
 A certain amount of applications is selected for the more rigorous (on-site) inspection. This may either happen due to an internal risk assessment (“selected_risk”) or randomly (“selected_random”). ​ A certain amount of applications is selected for the more rigorous (on-site) inspection. This may either happen due to an internal risk assessment (“selected_risk”) or randomly (“selected_random”). ​