 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
Each action has an associated cost.
|
|
| • |
Agent may accrue rewards at different
|
|
|
stages. A reward may depend on
|
|
|
|
The current state
|
|
|
|
The
(current state, action) pair
|
|
|
|
The
(current state, action, next state) triplet
|
|
| • |
Additivity assumption : Costs and rewards are
|
|
additive.
|
|
| • |
Reward accumulated = R(s0)+R(s1)+R(s2)+…
|
|
|
|