Process Discovery Contest (PDC)
Background
Process Mining is a relatively young research discipline that sits between computational intelligence and data mining on the one hand, and process modeling and analysis on the other hand. The idea of process mining is to discover, monitor and improve real processes (i.e., not assumed processes) by extracting knowledge from event logs readily available in today's (information) systems. These techniques provide new means to discover, monitor, and improve processes in a variety of application domains. There are two main drivers for the growing interest in process mining. On the one hand, more and more events are being recorded, thus, providing detailed information about the history of processes. On the other hand, there is a need to improve and support business processes in competitive and rapidly changing environments. The lion’s share of attention of Process Mining has been devoted to Process Discovery, namely extracting process models - mainly business process models - from an event log.
Objectives and Context
The Process Discovery Contest (PDC) is dedicated to the assessment of tools and techniques that discover business process models from event logs. The objective is to compare the efficiency of techniques to discover process models that provide a proper balance between “overfitting” and “underfitting”. A process model is overfitting (the event log) if it is too restrictive, disallowing behavior which is part of the underlying process. This typically occurs when the model only allows for the behavior recorded in the event log. Conversely, it is underfitting (the reality) if it is not restrictive enough, allowing behavior which is not part of the underlying process. This typically occurs if it overgeneralizes the example behavior in the event log.
Positioning of the Process Discovery Contest
The only other contest related to process mining is the annual Business Processing Intelligence Challenge (BPIC). The BPIC uses real-life data without objective evaluation criteria: It is about the perceived value of the analysis and is not limited to the discovery task (also conformance checking, performance analysis, etc.). The report is evaluated by a jury. The Process Discovery Contest is different. The focus is on process discovery. Synthetic data are used to have an objectified “proper” answer. Process discovery is turned into a classification task with a training set and a test set. A process model needs to decide whether traces are fitting or not.
Data Sets
The data sets of earlier instances of the Contest are available on 4TU.ResearchData through a DOI:
- PDC 2016: 10.4121/14625912.v1
- PDC 2017: 10.4121/14625948.v1
- PDC 2019: 10.4121/14625996.v1
- PDC 2020: 10.4121/14626020.v1
- PDC 2021: 10.4121/16803232.v1
- PDC 2022: 10.4121/21261402.v1
- PDC 2023: 10.4121/afd6f608-469e-48f9-977d-875b45840d39.v1
Organizers
- Josep Carmona, Universitat Politècnica de Catalunya (UPC), Spain (2016 to 2020)
- Massimiliano de Leoni, University of Padua, Italy(2016 to 2019)
- Benoît Depaire, Hasselt University, Belgium (2016 to 2020)
- Toon Jouck, Hasselt University, Belgium (2016 and 2017)
- Eric Verbeek, Eindhoven University of Technology, The Netherlands (2020 and onwards)