Home News About Us Contact Contributors Disclaimer Privacy Policy Help FAQ

Home
Search
Quick Search
Advanced
Fulltext
Browse
Collections
Persons
My eDoc
Session History
Login
Name:
Password:
Documentation
Help
Support Wiki
Direct access to
document ID:


          Institute: MPI für biologische Kybernetik     Collection: Biologische Kybernetik     Display Documents



ID: 461796.0, MPI für biologische Kybernetik / Biologische Kybernetik
Fitted Q-iteration by Advantage Weighted Regression
Authors:Neumann, G.; Peters, J.
Editors:Koller, D.; Schuurmans, D.; Bengio, Y.; Bottou, L.
Date of Publication (YYYY-MM-DD):2009-06
Title of Proceedings:Advances in Neural Information Processing Systems 21: Proceedings of the 2008 Conference
Start Page:1177
End Page:1184
Physical Description:8
Audience:Not Specified
Intended Educational Use:No
Abstract / Description:Recently, fitted Q-iteration (FQI) based methods have become more popular due to their increased sample efficiency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in real-world tasks, e.g., in robotics and other technical applications. The greedy action selection commonly used for the policy improvement step is particularly problematic as it is expensive for continuous actions, can cause an unstable learning process, introduces an optimization bias and results in highly non-smooth policies unsuitable for real-world systems. In this paper, we show that by using a soft-greedy action selection the policy improvement step used in FQI can be simplified to an inexpensive advantage-weighted regression. With this result, we are able to derive a new, computationally efficient FQI algorithm which can even deal with high dimensional action spaces.
External Publication Status:published
Document Type:Conference-Paper
Communicated by:Holger Fischer
Affiliations:MPI für biologische Kybernetik/Empirical Inference (Dept. Schölkopf)
Identifiers:LOCALID:5520
URL:http://nips.cc/Conferences/2008/
The scope and number of records on eDoc is subject to the collection policies defined by each institute - see "info" button in the collection browse view.