The policy can afterwards be received using functions getPolicy
and getPolicyW
.
Arguments
- mdp
The MDP loaded using
loadMDP()
.- w
The label of the weight we optimize.
- dur
The label of the duration/time such that discount rates can be calculated.
- maxIte
Max number of iterations. If the model does not satisfy the unichain assumption the algorithm may loop.
- getLog
Output the log messages.