Titelangaben
Nandini, Durgesh ; Blöthner, Simon ; Larch, Mario ; Schönfeld, Mirco:
Organising Knowledge from Text: Prompt-Based Triple Extraction and Graph Enrichment with Large Language Models.
In: Balke, Wolf-Tilo ; Golub, Koraljka ; Manolopoulos, Yannis ; Stefanidis, Kostas ; Zhang, Zheying
(Hrsg.):
Linking Theory and Practice of Digital Libraries : 29th International Conference on Theory and Practice of Digital Libraries, TPDL 2025, Tampere, Finland, September 23–26, 2025, Proceedings. -
Cham
: Springer
,
2025
. - S. 53-70
. - (Lecture Notes in Computer Science
; 16097
)
ISBN 978-3-032-05409-8
DOI: https://doi.org/10.1007/978-3-032-05409-8_5
Weitere URLs
Angaben zu Projekten
| Projekttitel: |
Offizieller Projekttitel Projekt-ID Berücksichtigung von kontextuellen Faktoren und strukturellen Gegebenheiten in einem dynamischen Rahmen (KONECO) 16DKWN095 |
|---|---|
| Projektfinanzierung: |
Bundesministerium für Bildung, Forschung, Technologie und Raumfahrt (BMFTR) |
Abstract
Creating and organising structured knowledge from unstructured text is a fundamental problem in computer science, with applications ranging from search and reasoning to prediction and recommendation systems. Knowledge Graphs (KGs) offer a powerful abstraction for organising information as entities and their semantic relationships, but they depend on high-quality, domain-specific relational data. Extracting this data remains particularly challenging when source texts are lengthy, formal, and context-dependent. This work presents a framework combining prompt-engineered Large Language Models (LLMs) with knowledge graph embedding methods to support automatic knowledge creation and integration. We explore different prompting strategies, zero-shot, one-shot, few-shot, and prompts with negative examples, for extracting subject–predicate–object triples from complex legal texts. The extracted triples are used to build a supplemental KG, which is merged with an existing domain-specific graph and embedded using TransE for downstream relation prediction tasks. As a use case, we focus on the trade domain and use Regional Trade Agreements (RTAs) as the primary source of textual data. RTAs are challenging documents due to their legally formal and often ambiguous language, and they require domain knowledge to be interpreted effectively. Yet, they encode rich contextual information about international economic relations. Our results show that LLM-extracted triples meaningfully enhance the semantic structure of the KG and improve predictive performance. We further identify which categories of predicates contribute most to these gains.

bei Google Scholar