Titelangaben
Phan, Thomy ; Chan, Shao-Hung ; Koenig, Sven:
Counterfactual Online Learning for Open-Loop Monte-Carlo Planning.
In: Proceedings of the AAAI Conference on Artificial Intelligence.
Bd. 39
(2025)
Heft 25
.
- S. 26651-26658.
ISSN 2159-5399
DOI: https://doi.org/10.1609/aaai.v39i25.34867
Angaben zu Projekten
| Projektfinanzierung: |
Andere National Science Foundation (NSF) under grant numbers 1817189, 1837779, 1935712, 2121028, 2112533, and 2321786, as well as gifts from Amazon Robotics and the Donald Bren Foundation. |
|---|
Abstract
Monte-Carlo Tree Search (MCTS) is a popular approach to online planning under uncertainty. While MCTS uses statistical sampling via multi-armed bandits to avoid exhaustive search in complex domains, common closed-loop approaches typically construct enormous search trees to consider a large number of potential observations and actions. On the other hand, open-loop approaches offer better memory efficiency by ignoring observations but are generally not competitive with closed-loop MCTS in terms of performance - even with commonly integrated human knowledge. In this paper, we propose Counterfactual Open-loop Reasoning with Ad hoc Learning (CORAL) for open-loop MCTS, using a causal multi-armed bandit approach with unobserved confounders (MABUC). CORAL consists of two online learning phases that are conducted during the open-loop search. In the first phase, observational values are learned based on preferred actions. In the second phase, counterfactual values are learned with MABUCs to make a decision via an intent policy obtained from the observational values. We evaluate CORAL in four POMDP benchmark scenarios and compare it with closed-loop and open-loop alternatives. In contrast to standard open-loop MCTS, CORAL achieves competitive performance compared with closed-loop algorithms while constructing significantly smaller search trees.
Weitere Angaben
| Publikationsform: | Artikel in einer Zeitschrift |
|---|---|
| Begutachteter Beitrag: | Ja |
| Keywords: | Planning; Counterfactual Reasoning; POMDP |
| Institutionen der Universität: | Fakultäten > Fakultät für Mathematik, Physik und Informatik > Institut für Informatik |
| Titel an der UBT entstanden: | Nein |
| Themengebiete aus DDC: | 000 Informatik,Informationswissenschaft, allgemeine Werke > 004 Informatik |
| Eingestellt am: | 17 Nov 2025 13:34 |
| Letzte Änderung: | 17 Nov 2025 13:34 |
| URI: | https://eref.uni-bayreuth.de/id/eprint/95265 |

bei Google Scholar