Publications Freek Stulp

Back to Homepage

• Sorted by Date • Classified by Publication Type • Classified by Research Category •
Reinforcement Learning of Full-body Humanoid Motor Skills
	Freek Stulp, Jonas Buchli, Evangelos Theodorou, and Stefan Schaal. Reinforcement Learning of Full-body Humanoid Motor Skills. In 10th IEEE-RAS International Conference on Humanoid Robots, pp. 405–410, 2010. Best paper finalist
	Download
	[PDF]918.0kB
	Abstract
	Applying reinforcement learning to humanoid robots is challenging because humanoids have a large number of degrees of freedom and state and action spaces are continuous. Thus, most reinforcement learning algorithms would become computationally infeasible and require a prohibitive amount of trials to explore such high-dimensional spaces. In this paper, we present a probabilistic reinforcement learning approach, which is derived from the framework of stochastic optimal control and path integrals. The algorithm, called Policy Improvement with Path Integrals (PI^2), has a surprisingly simple form, has no open tuning parameters besides the exploration noise, is model-free, and performs numerically robustly in high dimensional learning problems. We demonstrate how PI^2 is able to learn full-body motor skills on a 34-DOF humanoid robot. To demonstrate the generality of our approach, we also apply PI^2 in the context of variable impedance control, where both planned trajectories and gain schedules for each joint are optimized simultaneously.
	BibTeX

@InProceedings{stulp10reinforcement,
  title                    = {Reinforcement Learning of Full-body Humanoid Motor Skills},
  author                   = {Freek Stulp and Jonas Buchli and Evangelos Theodorou and Stefan Schaal},
  booktitle                = {10th IEEE-RAS International Conference on Humanoid Robots},
  year                     = {2010},
  note                     = {{\bf Best paper finalist}},
  pages                    = {405-410},
  abstract                 = {Applying reinforcement learning to humanoid robots is challenging because humanoids have a large number of degrees of freedom and state and action spaces are continuous. Thus, most reinforcement learning algorithms would become computationally infeasible and require a prohibitive amount of trials to explore such high-dimensional spaces. In this paper, we present a probabilistic reinforcement learning approach, which is derived from the framework of stochastic optimal control and path integrals. The algorithm, called Policy Improvement with Path Integrals (PI^2), has a surprisingly simple form, has no open tuning parameters besides the exploration noise, is model-free, and performs numerically robustly in high dimensional learning problems. We demonstrate how PI^2 is able to learn full-body motor skills on a 34-DOF humanoid robot. To demonstrate the generality of our approach, we also apply PI^2 in the context of variable impedance control, where both planned trajectories and gain schedules for each joint are optimized simultaneously.},
  bib2html_pubtype         = {Refereed Conference Paper, Awards},
  bib2html_rescat          = {Reinforcement Learning of Robot Skills}
}

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints.

Generated by bib2html.pl (written by Patrick Riley ) on Mon Jul 20, 2015 21:50:11