Titelangaben
Buss, Alina ; Kecht, Christoph ; Kratsch, Wolfgang ; Röglinger, Maximilian ; Sadeghianasl, Sareh ; Wynn, Moe T.:
Process Mining Between the Lines : Extracting Object-Centric Event Logs From Textual Data.
In: Information Systems.
Bd. 140
(2026)
.
- 102713.
ISSN 0306-4379
DOI: https://doi.org/10.1016/j.is.2026.102713
Angaben zu Projekten
| Projektfinanzierung: |
QUAPRO |
|---|
Abstract
Organizations generate vast amounts of unstructured textual data – a valuable source of information that frequently remains underutilized for process mining. However, textual descriptions often record exceptions and manual activities absent from structured data, and therefore, enable a better understanding of deviations from the expected business process behavior. Importantly, unstructured sources typically retain the object-centric characteristics of real-world processes – information that gets flattened or lost in case-centric event logs. Yet, existing approaches primarily target structured data sources or produce case-centric event logs. To address this gap, we present an automated approach to derive object-centric event logs directly from unstructured textual descriptions. The approach comprises two subcomponents: a collector that identifies events and objects (including their attributes and relationships), and a refiner that consolidates and cleans the extracted information. We instantiate each subcomponent in heuristic and generative implementations and create four pairwise combinations of collector and refiner instances to assess the effectiveness of heuristic natural language processing and generative artificial intelligence techniques. We compare these variants quantitatively and qualitatively in a controlled, artificial setting based on synthesized texts and demonstrate the practical utility on two naturally occurring corpora (fire status updates and a legal judgment). Our results show that the configurations with a generative collector achieve the highest extraction quality. In particular, the fully generative variant produces coherent and standardized event and object labels. Overall, this study fills a notable research gap by enabling the incorporation of textual information into process mining applications.

bei Google Scholar