Academic stories: Dirk Fahland

academic stories exploring newland

Talking with Dirk Fahland

Dirk Fahland is an associate professor with the Process Analytics group at the Eindhoven University of Technology (TU/e). In this interview, we talk with him about current research challenges, the future of process mining from multiple perspectives, and how to be a serial best-paper award winner at ICPM!

Dirk, tell us a bit about yourself and your current area of research.

At present, we are focussing on multi-dimensional data. All came to fruition after Wil left, actually, but it’s only a matter of correlation, not causation! In fact, Wil put me on the topic when I entered the group to work on the ACSI EU project. We were interested in analysing data that were related to more than one case. The turning point for me came with the Vanderlande project on Process Mining in Logistics. I got appointed as the project manager for the scientific side. At Vanderlande, they had an innocent question: they wanted to apply process mining to improve their processes in their material handling systems (MHS), such as baggage handling and warehouse automation. For one year, we had barely any results to show as process mining techniques did not really reveal the dynamics that mattered: how multiple cases going through the same part of the system cause affect how they are processed – think of short-term bottlenecks and re-routing. Until Vadim Denisov came up with the performance spectrum (also available on GitHub). It was originally devised to evaluate simulations of individual bags against historic data. Suddenly, we saw things. We got a sense of what it means to have a complete view of an entire system at once – seeing how all individual cases together form performance patterns we had not seen before.

During that project, and considering my past research, I started wondering what my main investigation area was about. It took me half a year to find an answer that summarised it all nicely. The answer was that all things I had been investigating had the concept of “multiple” in common. I am now trying to structure this problem space of multiple behavioural dimensions to properly describe it. Notice that it is not an easy task, considering our usual work based on sequences and isolated cases – we have to collectively unlearn past mistakes.

In the meantime, we came up with the graph-based approach to manage event data. As we stored event graphs into graph databases, some colleagues observed that those were knowledge graphs derived from event data. This led us to the idea of a “thinking assistant”. When you look beyond classical business intelligence questions with process mining and work with industry, you realise you need something better than a dashboard that presents you with process maps, statistics about KPIs and how they change over time – especially when you consider dynamics over multiple objects and cases at once. You need a more advanced platform to actively support engineers in solving problems.

All of a sudden, I realised the two aspects somehow came together. On one hand, the fundamental research challenge on “multiples” and, on the other hand, the very applied question about how to make this accessible to users in practice. I recently summarised this paradigm shift that we learned in the Vanderlande project from process mining to thinking assistants using system knowledge graphs in a talk at the EAISI summit 2021.

When and why did you first come up with the idea to do research in process mining?

My first real research in process mining started at the end of my PhD. Back then I was into process modelling and verification. Not much to do with event data yet: I mainly used partial-order and unfolding techniques. I was on a research visit at TU/e and the only algorithms I could use back then to obtain fitting models from real data were the ILP miner and the transition system miner, mostly known because of the lovely spaghetti models it produces. The heuristic miner implementation in ProM 6 was still in its infancy (and I could not understand with my formal background how partially fitting models made sense). I had unfoldings, my hammer, and those complicated models were the perfect nail – so I looked into how I could simplify these spaghetti models, succeeded and received my first “best paper award” for it. I was in an environment where process mining was much appreciated, of course, and I had the chance to challenge it from my angle.

What is your favourite “Eureka!” moment you had while conducting your research?

My “Eureka!” moment came already in November 2010. A long time ago! I studied how to describe behaviour with multiple case identifiers. I had just been reading the thesis of Ronny Mans, who had investigated that kind of process in the healthcare system. The models he produced used Wil’s very old Proclet model (from 2000!) but he essentially saw the need to model the behaviour between two objects as an object of its own. I then connected these ideas to the databases in which our event data was stored and – to talk about multiple behavioural dimensions – came up with what I am still doing today: event graphs and synchronous composition (see this slide deck from 2010!).

At that time, there were almost no graph databases if we do not consider early fundamental research results. Of course, RDF triple stores were around but they could not do what I needed them to do. It took nine or ten years until the moment when someone showed Neo4j to me!

But coming to think of it – the most important insight that made this thinking possible was when Carl Adam Petri himself explained to me what Petri nets actually are.

How do you see process mining in the future?

In the near future, I think we will no longer analyse processes on isolated cases and individual processes. I am not saying that process discovery, conformance checking, or prediction along case identifiers will disappear. They remain building blocks. However, I find it rather stupid to analyse even single cases ignoring multiple perspectives at once. In fact, to actually improve a process we should look at its context and the network it creates: the organisation, the business objects, the behavioural features of both, and so on. I see this paradigm shift as a great opportunity since there are so many open problems out there! After all, most of the transactional data that we record already allow us to build such networks. They report who the actors are, how objects are altered, and many additional details — the information is already there. Just, let’s not ignore these details. That was one of the points discussed during the XES 2.0 workshop. Once we have rich data properly linked and represented we have a better grip on domain knowledge modelling. In turn, such a rich knowledge graph can boost our process improvement and support! For example, from a plain event log with resources logged, we can already identify complex task execution patterns, routines, and habits of an organisation – however, you have to look at two behaviuoral dimensions at once to see it and we now have the tools to do it.

Any closing remarks on how to win the best paper award at ICPM, as you won two in a row?

Both papers have in common that they use graphs. But that’s just a coincidence. Again, correlation and not causation. Graphs happen to be a good tool though. The paths for the two papers are quite different. The ICPM 2020 one is on detecting cascades. I did not know I would have ever been doing that sort of research and writing any such paper until we saw the ideas materialising. We were exposed to the material handling systems and we happened to hire a very smart PhD student, who had this idea. I had to talk to many people, including Marwan Hassani and Wil, to conceptualise the problem. What made our paper work was that the key idea (building higher-level events across events of classical cases and then correlating them), though simple, could be applied consistently and repeatedly. Probably that is the reason why it attracted positive feedback from the reviewers.

The idea behind the ICPM 2021 paper on process discovery using graph neural networks had been in my mind for five years already. I wanted to train neural networks on process data for automated process discovery. It remained in my agenda until another brilliant person came along (this time, a student in his Master’s). I needed someone who was very skilled in deep learning. And he was, though we needed some time to understand one another’s viewpoints. The idea of a “model” in process mining and deep learning are fundamentally different, in fact.

Overall, I see clarity and understandability as key ingredients for good papers. The manuscript has to flow. Especially in these cases: I saw the ideas were really good and wanted to make sure they were explained well, as they turned out to be rather uncommon. By the way, I think this is another crucial factor. Both contributions are completely novel. No reiterations of existing concepts. I must say I am very happy to have managed to work on substantially novel solutions, even obtaining good results!