Date of Original Version

2004

Type

Conference Proceeding

Rights Management

http://books.nips.cc/papers/files/nips16/NIPS2003_CN11.pdf

Abstract or Table of Contents

We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide non-trivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem.

Included in

Robotics Commons

Share

COinS