Policy evaluation
Given a policy Π:SA, find value of each
state using this policy.
VΠ(s) = R(s) + c(Π(s)) +
        γ[Σs’εS Pr(s’|a,s)VΠ(s’)]
This is a system of linear equations
involving |S| variables.