Goal for an MDP
Find a policy which:
  maximises  expected discounted reward for
  an infinite horizon for a
  fully observable
  Markov decision process.
Why shouldn’t the planner find a plan??
What is a policy??