Date of Original Version
Abstract or Description
We study optimization for collaborative multi-agent planning in factored Markov decision processes (MDPs) with shared resource constraints. Following past research, we derive a distributed planning algorithm for this setting based on Lagrangian relaxation: we optimize a convex dual function which maps a vector of resource prices to a bound on the achievable utility. Since the dual function is not differentiable, the most common method for optimizing it is subgradient descent. This method is appealing, since we can compute the subgradient by asking each agent to plan independently of the others using the current resource prices; however, subgradient descent unfortunately requires O(∊ −2 ) iterations to achieve accuracy ∊ and therefore the overall Lagrangian relaxation algorithm can have trouble scaling to realistic domains. So, instead, we propose to optimize a smoothed version of the dual function via a fast proximal gradient algorithm. By trading the error caused by smoothing against the faster convergence of the proximal gradient method, we demonstrate that we can obtain faster (O(∊−1 )) convergence of the overall Lagrangian relaxation. Furthermore, we propose a particular smoothing method, based on maximum causal entropy, for which the subgradient calculation remains simple and efficient.
Proceedings of NIPS workshop on Optimization for Machine Learning.