Goal for an MDP
•
Find a
policy
which:
maximises
expected
discounted reward
for
an
infinite horizon
for a
fully observable
Markov
decision process.
Why shouldn’t the planner find a plan??
What is a policy??