ERef Bayreuth

Anmelden

Literatur vom gleichen Autor/der gleichen Autor*in

bei Google Scholar

Bibliografische Daten exportieren

Generative Curricula for Multi-Agent Path Finding via Unsupervised and Reinforcement Learning

Titelangaben

Phan, Thomy ; Phan, Timy ; Koenig, Sven:
Generative Curricula for Multi-Agent Path Finding via Unsupervised and Reinforcement Learning.
In: Journal of Artificial Intelligence Research. Bd. 82 (2025) . - S. 2471-2534.
ISSN 1943-5037
DOI: https://doi.org/10.1613/jair.1.17403

Volltext

Link zum Volltext (externe URL):

Angaben zu Projekten

Projektfinanzierung:	Andere National Science Foundation (NSF) under grant numbers 1817189, 1837779, 1935712, 2121028, 2112533, and 2321786, as well as gifts from Amazon Robotics and the Donald Bren Foundation.

Abstract

Multi-Agent Path Finding (MAPF) is the challenging problem of finding collision-free paths for multiple agents, which has a wide range of applications, such as automated warehouses, smart manufacturing, and traffic management. Recently, machine learning-based approaches have become popular in addressing MAPF problems in a decentralized and potentially generalizing way. Most learning-based MAPF approaches use reinforcement and imitation learning to train agent policies for decentralized execution under partial observability. However, current state-of-the-art approaches suffer from a prevalent bias to micro-aspects of particular MAPF problems, such as congestions in corridors and potential delays caused by single agents, leading to tight specializations through extensive engineering via oversized models, reward shaping, path finding algorithms, and communication. These specializations are generally detrimental to the sample efficiency, i.e., the learning progress given a certain amount of experience, and generalization to previously unseen scenarios. In contrast, curriculum learning offers an elegant and much simpler way of training agent policies in a step-by-step manner to master all aspects implicitly without extensive engineering. In this paper, we propose a generative curriculum approach to learning-based MAPF using Variational Autoencoder Utilized Learning of Terrains (VAULT). We introduce a two-stage framework to (I) train the VAULT via unsupervised learning to obtain a latent space representation of maps and (II) use the VAULT to generate curricula in order to improve sample efficiency and generalization of learning-based MAPF methods. For the second stage, we propose a bi-level curriculum scheme by combining our VAULT curriculum with a low-level curriculum method to improve sample efficiency further. Our framework is designed in a modular and general way, where each proposed component serves its purpose in a black-box manner without considering specific micro-aspects of the underlying problem. We empirically evaluate our approach in maps of the public MAPF benchmark set as well as novel artificial maps generated with the VAULT. Our results demonstrate the effectiveness of the VAULT as a map generator and our VAULT curriculum in improving sample efficiency and generalization of learning-based MAPF methods compared to alternative approaches. We also demonstrate how data pruning can further reduce the dependence on available maps without affecting the generalization potential of our approach.

Weitere Angaben

Publikationsform:	Artikel in einer Zeitschrift
Begutachteter Beitrag:	Ja
Keywords:	Curriculum learning; multi-agent path finding; reinforcement learning; unsupervised learning
Institutionen der Universität:	Fakultäten > Fakultät für Mathematik, Physik und Informatik > Institut für Informatik Fakultäten Fakultäten > Fakultät für Mathematik, Physik und Informatik Fakultäten > Fakultät für Mathematik, Physik und Informatik > Institut für Informatik > Juniorprofessur Künstliche Intelligenz und Maschinelles Lernen > Juniorprofessur Künstliche Intelligenz und Maschinelles Lernen - Juniorprof. Dr. Thomy Phan
Titel an der UBT entstanden:	Nein
Themengebiete aus DDC:	000 Informatik,Informationswissenschaft, allgemeine Werke > 004 Informatik
Eingestellt am:	17 Nov 2025 08:35
Letzte Änderung:	25 Nov 2025 06:34
URI:	https://eref.uni-bayreuth.de/id/eprint/95267