Exploring newland: From Process Mining to Automated Process Improvement

exploring newland

An interview with Marlon Dumas

The European Research Council (ERC) Advanced Grants are among the most prestigious research grants in Europe. An ERC Advanced Grant provides funding of up to 2.5 million euros to an individual researcher in order to implement a five-years fundamental research project that has the potential to lead to groundbreaking advances in its field.

In 2019, the ERC awarded its first-ever grant in the field of process mining to Prof. Marlon Dumas from the University of Tartu. This grant is a recognition of the growing importance of process mining as a field of research on its own. We asked Marlon to tell us about his trajectory in the field of process mining and where he expects this grant to take him.

Tell us a bit about yourself and your research institute.

I am leader of the Business Process Management (BPM) research group at the University of Tartu. We are a team of five researchers and seven PhD students. We conduct research on process mining, predictive process monitoring, business process privacy, and verifiable process execution.

When and why did you first come up with the idea to do research in process mining?

Between 2005 and 2013, my research agenda was focused on business process modeling and analysis. Together with colleagues at the Queensland University of Technology, we conducted research on process model verification, simulation, performance analysis, and process model versioning, comparison, and merging. This research eventually led to the development of Apromore, an open-source toolset for managing collections of process models (apromore.org).

Back in 2013, we saw great potential in integrating process modeling and process mining in a single platform. So we pivoted Apromore in the direction of process mining. First, we integrated existing process mining techniques into Apromore. But when testing these techniques with real-life data and showing the results to end-users, we were not satisfied with the state-of-the-art in process mining. We noticed that existing automated process discovery techniques, such as heuristics miner and inductive miner, led to incorrect or imprecise process models when applied to real-life logs. Also, existing conformance checking techniques were too focused on measuring the level of conformance between a model and a log on a 0-to-1 scale, rather than presenting mismatches between an event log and a process model in a user-friendly manner.

As researchers, we wanted Apromore to go beyond the state of the art. So we designed new process mining algorithms, such as the Split Miner algorithm for automated process discovery, the Behavioral Alignment approach for conformance checking, as well as techniques for automated discovery hierarchical process models, for fast and accurate identification of business process changes, as well as techniques for comparison of business process variants based on event logs.

These techniques are nowadays packed in Apromore’s Community Edition. Since early 2019, we have put together a professional software development team, which has re-implemented a lot of these techniques. We have recently created a spin-off to commercialize this technology. There will be more news about Apromore's commercial edition in the coming months.

What is predictive monitoring and how can process mining inform it?

Process mining is a family of tactical management tools. The end-users of process mining tools are analysts and managers, who use process mining to identify performance and conformance issues in a process, to assess the impact of these issues, and to understand their root causes. The end result of process mining is insights that help analysts and managers to make tactical decisions on how to change the process in the medium-term in order to enhance its performance.

In contrast, predictive process monitoring is a family of operational management tools. The end-users of a predictive process monitoring tool are process workers and operational managers. A predictive process monitoring tool uses machine learning models to warn you, for example, that a particular case of your order-to-cash process will end up delayed, or that it will lead to a customer complaint, a returned product, or a refund claim from the customer. Using these predictions, workers and their direct managers can make near-real-time decisions as to when and how to intervene in order to prevent negative outcomes (for instance, preventing customer complaints).

Together with colleagues at the University of Tartu and FBK Trento, we have developed an open-source predictive process monitoring engine called Nirdizati (nirdizati.org). Nirdizati allows analysts to train predictive models from event logs with minimal knowledge of machine learning. The resulting machine learning models can then be used to produce predictive monitoring dashboards. The main functionality of Nirdizati has been ported into Apromore Community Edition.

Tell us about your ERC Grant award. What are the expected outcomes of this project?

Current approaches to discover process improvement opportunities are expert-driven. In these approaches, data are used to assess opportunities derived from experience and intuition rather than to discover them in the first place. Moreover, since the assessment of opportunities is manual, an analyst can only realistically explore a fraction of the overall space of improvement opportunities.

The past two decades have seen an increased level of digitization of business processes, and with it, increased availability of fine-grained data about the execution of business processes. This data availability allows us to move from purely manual and expert-driven process improvement approaches, to more automated and data-driven approaches in which improvement opportunities are discovered and assessed systematically. 

In this ERC project, we will build the foundations of a new generation of process improvement methods that do not exclusively rely on guidelines and heuristics, but rather on a systematic exploration of a space of possible changes derived from process execution data. Specifically, we will develop algorithms to analyze process execution data in order to discover changes to a process that are likely to improve its performance. This includes changes in the control-flow dependencies between activities, partial automation of activities (for example, using robotic process automation bots), changes in resource allocation rules, or changes in decision rules that may reduce wastes or negative outcomes. 

The outputs of the project will be embodied in an open-source toolset called the Process Improvement Explorer (PIX). The PIX toolset will allow users to interactively explore spaces of process improvement opportunities with respect to a set of performance measures. By interactive, we mean that the user is able to start with a set of performance measures, provide the metadata required to construct the process improvement space, explore the Pareto-optimal groups of changes. A group of changes is Pareto-optimal if there is no other group of changes that is better along one performance measure without simultaneously being worst along another measure. For example, let’s say that our objective is to reduce the cycle time of the process and to reduce its cost. A group of changes is Pareto-optimal if there is no other group of changes that can achieve both lower cycle time and lower cost simultaneously.

To illustrate the envisioned capabilities of PIX, let me take the following scenario. An analyst is tasked with analyzing an order-to-cash process due to rising customer dissatisfaction. The execution of this process is supported by an information system that records the creation of each case of this process as well as the execution of each activity. As in process mining, the data collected by the information system can be extracted as an event log, consisting of a collection of event records. Each record has a timestamp, a reference to an activity, a case identifier, and additional attributes. Let’s assume also that the system records events that take place while an activity is performed, such as URLs accessed by a worker performing an activity, data entered into a field of a Web form by the worker, files and software applications opened (an event may indicate that a given Excel sheet is opened while the employee is verifying an order), and events occurring inside these applications (for example, when a cell in the spreadsheet is selected). 

To explore the space of improvement opportunities using PIX, the analyst will specify the performance measures of interest (for instance, the cycle time or the defect rate) as well as the allowed changes, such as: Which activities may be re-ordered? Which activities may be selectively executed/skipped? Which resource allocation rules may be changed and in what ways? Which activities may be automated? She may also specify performance constraints, like: the cycle time of the to-be process should not exceed a given threshold.

Given this input, PIX will allow the analyst to navigate through the space of possible process changes, focusing on the most promising ones (the Pareto-optimal ones). The analyst will be able to alter the set of performance measures, constraints, and allowed changes, so as to enhance or prune this search space. Once she selects a group of changes, a specification of these changes is generated. If a change implies the automation of an activity, an executable model to automate the activity is generated. If it implies the redefinition of rules, the new ruleset is produced, and so on.

Enabling data-driven process improvement is also one of the goals of process mining. How does the vision of PIX relate to process mining?

Process mining techniques focus on discovering process models, analyzing their performance, and comparing process models and event logs. While these techniques provide insight into the “as is” state of the process, they do not tell us how to improve the process in order to achieve a given set of performance objectives.

Process mining techniques are very suitable when we need to figure out what factors might be contributing to a given performance issue. But process mining techniques do not help us to identify, evaluate and explore possible improvement opportunities. Process mining is meant for “as is” analysis, not for “to-be” analysis. After using a process mining tool to identify bottlenecks, friction points, and root causes of performance issues, the analyst is left on her own. This is basically the ATAMO approach (And Then A Miracle Occurs). We hope that a light bulb will turn on in the analyst’s head. In reality, though, performance issues are too complex for analysts to be able to manually explore the full space of remedial actions in its full extent.

The goal of the PIX project is to fill in the gap from “I have analyzed a performance issue and I know more or less where it comes from” to “I have found a suitable set of changes to address this performance issues”.

For example, think about a handful of customers in a region who are experiencing delays in receiving the products they have ordered. Naturally, these customers will start visiting the company’s website often to check the status of their delivery. PIX will detect that these customers are experiencing a delay and will trace down the reason for these delays. It might be that a supplier is waiting for missing information to fill out a customs clearance document. PIX will propose ways of preventing this problem from happening in future. For example, it might propose to add an item in the checklist used by employees in the warehouse, asking them to enter the required customs clearance information for every product that contains liquids.

Do you have any ideas or results on how to develop this sort of magic “idea-generation box”?

So far, we have developed two components of PIX. The first one is a tool for discovering simulation models. It’s called Simod. Simod is a box that takes as input an event log and produces a simulation model out of it. The simulation model is optimized for accuracy, which is essential because we will rely on the output of Simod to determine which process changes are better. In other words, Simod enables us to do a “what-if” analysis in a fine-grained and accurate manner.

The second ingredient is an optimization engine that takes as input an event log and generates all sorts of “candidate changes” that are likely to improve a process with respect to multiple performance measures (for example, processing time, cycle time, cost). This optimization engine invokes Simod in order to determine to what extent a given change improves (or degrades) the performance of the process along multiple dimensions. The optimization engine then builds a Pareto-optimal set of changes with respect to a set of performance measures.

It’s early days. But one step at a time, we hope to make some progress towards the long-term vision of automated process improvement.