Publications Freek Stulp


Back to Homepage
Sorted by DateClassified by Publication TypeClassified by Research Category
Adaptive Exploration for Continual Reinforcement Learning
Freek Stulp. Adaptive Exploration for Continual Reinforcement Learning. In International Conference on Intelligent Robots and Systems (IROS), pp. 1631–1636, 2012.
Download
[PDF]700.6kB  
Abstract
Most experiments on policy search for robotics focus on isolated tasks, where the experiment is split into two distinct phases: 1) the learning phase, where the robot learns the task through exploration; 2) the exploitation phase, where exploration is turned off, and the robot demonstrates its performance on the task it has learned. In this paper, we present an algorithm that enables robots to continually and autonomously alternate between these phases. We do so by combining the `Policy Improvement with Path Integrals' direct reinforcement learning algorithm with the covariance matrix adaptation rule from the `Cross-Entropy Method' optimization algorithm. This integration is possible because both algorithms iteratively update parameters with probability-weighted averaging. A practical advantage of the novel algorithm, called PI2-CMA, is that it alleviates the user from having to manually tune the degree of exploration. We evaluate PI2-CMA's ability to continually and autonomously tune exploration on two tasks.
BibTeX
@InProceedings{stulp12adaptive,
  title                    = {Adaptive Exploration for Continual Reinforcement Learning},
  author                   = {Freek Stulp},
  booktitle                = {International Conference on Intelligent Robots and Systems (IROS)},
  year                     = {2012},
  pages                    = {1631-1636},
  abstract                 = {Most experiments on policy search for robotics focus on isolated tasks, where the experiment is split into two distinct phases: 1)~the learning phase, where the robot learns the task through exploration; 2)~the exploitation phase, where exploration is turned off, and the robot demonstrates its performance on the task it has learned. In this paper, we present an algorithm that enables robots to continually and autonomously alternate between these phases. We do so by combining the `Policy Improvement with Path Integrals' direct reinforcement learning algorithm with the covariance matrix adaptation rule from the `Cross-Entropy Method' optimization algorithm. This integration is possible because both algorithms iteratively update parameters with probability-weighted averaging. A practical advantage of the novel algorithm, called PI2-CMA, is that it alleviates the user from having to manually tune the degree of exploration. We evaluate PI2-CMA's ability to continually and autonomously tune exploration on two tasks.},
  bib2html_accrate         = {\%},
  bib2html_pubtype         = {Refereed Conference Paper},
  bib2html_rescat          = {Reinforcement Learning of Robot Skills}
}

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints.


Generated by bib2html.pl (written by Patrick Riley ) on Mon Jul 20, 2015 21:50:11