On the Sample Complexity of
Reinforcement Learning
Sham M. Kakade
Gatsby Computational Neuroscience Unit
University College London, 2003
[pdf]
Abstract:
This thesis is a detailed investigation into the following question:
how much data must an agent collect in order to perform
"reinforcement learning" successfully? This question is analogous to
the classical issue of the sample complexity in supervised learning,
but is harder because of the increased realism of the reinforcement
learning setting. This thesis summarizes recent sample complexity
results in the reinforcement learning literature and builds on these
results to provide novel algorithms with strong performance
guarantees.
We focus on a variety of reasonable performance criteria and
sampling models by which agents may access the environment. For
instance, in a policy search setting, we consider the problem of how
much simulated experience is required to reliably choose a "good"
policy among a restricted class of policies \Pi (as in Kearns,
Mansour, and Ng [2000]). In a more online setting, we consider the
case in which an agent is placed in an environment and must follow
one unbroken chain of experience with no access to "offline"
simulation (as in Kearns and Singh [1998]).
We build on the sample based algorithms suggested by Kearns,
Mansour, and Ng [2000]. Their sample complexity bounds have no
dependence on the size of the state space, an exponential dependence
on the planning horizon time, and linear dependence on the
complexity of \Pi . We suggest novel algorithms with more restricted
guarantees whose sample complexities are again independent of the
size of the state space and depend linearly on the complexity of the
policy class \Pi , but have only a polynomial dependence on the
horizon time. We pay particular attention to the tradeoffs made by
such algorithms.
