Reward/cost
Each action has an associated cost.
Agent may accrue rewards at different
stages. A reward may depend on
  The current state
  The (current state, action) pair
  The (current state, action, next state) triplet
Additivity assumption : Costs and rewards are
additive.
Reward accumulated = R(s0)+R(s1)+R(s2)+…