Academic stories: Martin Kabierski
academic stories exploring newlandNarrated by Martin Kabierski
My name is Martin Kabierski. Currently, I work as a postdoctoral researcher at the research group “Workflow Systems and Technologies” at the University of Vienna in Austria. The group consists of five researchers, Han van der Aa as team head, me and Anton Yeshchenko as postdoctoral researchers, and Sana Dodangeh and Bernold Abarca as predoctoral researchers. As of this year, the university is also ranked among the top 100 universities according to the Times Higher Educational Ranking, as the highest-ranked Austrian university, and has been home to many brilliant researchers, such as Anton Zeilinger, Emanuelle Charpentier, Karl Popper, and Erwin Schrödinger.
Before joining the group in July this year, I pursued my PhD at the “Database and Information Systems” research group at Humboldt-Universität zu Berlin, led by Matthias Weidlich from 2020 till 2024, and at the research group “Security and Transparency of Digital Processes” at the Weizenbaum Institute, Berlin, led by Jan Mendling, from 2024 till my departure to Vienna. From this time, you will also find a few research articles under my old last name “Bauer”, with which I started my research endeavors, but which I changed to “Kabierski”, the last name of my wife. So, for anyone wondering: no, I am, in fact, not a Polish citizen.
Thinking back, the need for process mining implicitly occurred to me during my Bachelor courses in Computer Science at Humboldt-Universität zu Berlin. During that time, I took a modelling course with Wolfgang Reisig, and while I found modelling to be very interesting, I also found it very time-consuming. This led me to wonder if there aren’t tools for automated modelling based on some inputs that could speed up the entire process. The answer to this question came relatively quickly, as Matthias Weidlich, back then, joined the institute of Computer Science as a Junior Professor, and showcased to me that such support indeed exists, that it goes well beyond simple discovery of models and processes, and that this toolbox is called process mining. This, to me, was the missing puzzle piece that sparked my interest in the field, proved to me the enormous potential of process mining, and eventually led me down the road of pursuing a PhD in that same area.
My research interests mostly revolve around the representativeness of event data. That is, I investigate how we can quantify how well the available event logs represent the business processes they are generated from, how sensitive process analysis tasks are to potential issues arising from that, and if we can be more efficient in our analysis endeavors when only looking at small samples of the available data. If the data we have at hand does not correctly capture all aspects of the process that are relevant for our intended analysis, and if our algorithms are not capable of dealing with such discrepancies, there is no way of verifying whether what comes out at the end of the analysis pipeline actually is trustworthy, or in other words: garbage in – garbage out. I started with this research line a while back for my Bachelor's thesis in 2017, which was about the creation of a sampling algorithm for making process discovery more efficient. There, I realized that depending on the task at hand, just 1% to 10% of the event data was enough to arrive at an accurate process model. This work led to my first conference visit at the CAiSE 2018 in Tallinn, Estonia, where I presented the results of the thesis as part of the main track program [1]. For me, this was a special moment, as I had never been to a research conference before. I was just starting my Master's program at that time. I kept working as a student assistant for the “Databases and Information Systems” group, where I could proceed with this research parallel to my master’s courses. During that time, I also obtained the offer to start as a PhD student in the same group, as part of a DFG-funded research project.
During my PhD, I was luckily given the opportunity to dive into topics that interested me beyond the main focus of my project. For instance, we were able to publish some work about the privacy-protected evaluation of process performance indicators [2], and I could support fellow colleagues in their privacy-related research [3]. Nonetheless, the statistical and sample-based analysis of business processes remained one of my main areas of research. During the first years of my PhD time, my focus remained on the design of efficient sampling algorithms, but now with a focus on alignment-based conformance checking [4,5].
Eventually, this focus shifted towards the business process itself, and how to relate event data to it. In particular, the question of whether we can somehow capture how much of the possible behavior of the business process is captured in an event log sparked my interest. As it turns out, this surprisingly hard-sounding problem has been solved somewhere else for multiple decades already: in biodiversity research. When looking over a paper, that I had printed out just out of sheer curiosity and that was laying on my desk for well over a year, I was amazed to find out that biodiversity researchers have been well-aware of the statistical properties underlying collected samples, how to derive statistically meaningful insights from these incomplete collections, and how to quantify the completeness of samples – just what I wanted to know. As it turned out, we could directly apply these methods that have been established for roughly 60 years in the biodiversity community to estimate how complete event logs are. This work, which I presented at ICPM 2023 in Rome, was then awarded the Best Paper award, a recognition that showed me the importance of this topic [6,7]. Furthermore, my doctoral thesis, summarizing the contributions around the topic of representativeness of event data, was also awarded the Best PhD Thesis award just recently at ICPM 2025 in Montevideo this year, which I feel very grateful for. Both recognitions cemented two important key insights for me: First, there is a lot to be gained by broadening the own horizon and actively looking at other communities and the problems they try to solve. In the best case, whatever you try to solve has already been solved somewhere else. Second, simply analyzing our processes based on the data, assuming the data is a perfect representation of the process, is not going to cut it. We have to ground our data in the systems from which they originate and keep in mind where and how discrepancies may arise. Only then can we be aware of possible issues or uncertainties inherent to the data that would otherwise remain obscured. This goes beyond the sample-based nature of event logs. It includes aspects of data quality management and the many implicit decisions that drive data preparation, two topics that I am also investigating since I started my Postdoc position in Vienna as well.
I firmly believe that the data-driven analysis of processes has a lot of potential for signaling improvement opportunities and unearthing relevant patterns in processes. Yet, I believe that to advance process mining as a field, we also must pay critical attention to those cases where, due to “bad” data or other related issues, no real value could be obtained by process mining initiatives. Again, all process mining initiatives can only be as good as the data that is available. This becomes even more important for the more complex object-centric data format, which is currently evolving. Systematic support of data preprocessing steps, the detection of data quality issues, and the derivation of issue-aware algorithms can all work together to ensure reliable process results. I believe much is still to be learned in these areas, and I am looking forward to contributing and hope that others do so as well.
Regarding next events, I am for sure looking forward to BPM 2026 in Toronto, organized by Arik Senderovich’s team at York University, where I have the honor of serving as one of the Demo track chairs. Other than that, I hope to be able to go to CAiSE in Verona next year, but this, of course, will depend on the outcomes of the review sessions.
[1] How Much Event Data Is Enough? A Statistical Framework for Process Discovery
(https://doi.org/10.1007/978-3-319-91563-0_15)
[2] Hiding in the forest: Privacy-preserving process performance indicators
(https://doi.org/10.1016/j.is.2022.102127)
[3] Semantics-aware mechanisms for control-flow anonymization in process mining
(https://doi.org/10.1016/j.is.2023.102169)
[4] Sampling and approximation techniques for efficient process conformance checking
(https://doi.org/10.1016/j.is.2020.101666)
[5] Sampling What Matters: Relevance-guided Sampling of Event Logs
(https://doi.org/10.1109/ICPM53251.2021.9576875)
[6] Addressing the Log Representativeness Problem using Species Discovery
(https://doi.org/10.1109/ICPM60904.2023.10272004)
[7] Quantifying and relating the completeness and diversity of process representations using species estimation
(https://doi.org/10.1016/j.is.2024.102512)
- ICPM 2025: Some reflections
- Academic stories: Martin Kabierski
- ICPM 2026: Industry Days back in Europe
- OCED: Community calls, extended working group, usage scenarios and target personas
- Call for papers: “Collaboration Process Mining for Distributed Systems” (a special issue of Information Systems)
- Call for papers: “Caring for Health Processes” (a collection of the Process Science journal)
- Call for survey participation: Challenges in Teaching Process Mining
- End-user’s corner: Julian Theis
- This article has been updated on December 10 2025, 17:21.
- Narrated by Martin Kabierski
