Perform policy iteration using the average expected-weight Bellman operator on the MDP.
Source:R/mdp.R
run_policy_ite_ave.RdThe policy can afterwards be received using functions get_policy and get_policy_w.
Usage
run_policy_ite_ave(
mdp,
w,
dur,
max_ite = 100,
objective = c("max", "min"),
get_log = TRUE
)Arguments
- mdp
The MDP loaded using
load_mdp().- w
The label of the weight we optimize.
- dur
The label of the duration/time such that discount rates can be calculated.
- max_ite
Max number of iterations. If the model does not satisfy the unichain assumption the algorithm may loop.
- objective
Optimize by maximizing (
"max") or minimizing ("min") the Bellman value.- get_log
Output the log messages.