Perform policy iteration (average reward criterion) on the MDP. — runPolicyIteAve • MDP2

The policy can afterwards be received using functions getPolicy and getPolicyW.

Usage

runPolicyIteAve(mdp, w, dur, maxIte = 100, getLog = TRUE)

Arguments

mdp: The MDP loaded using loadMDP().
w: The label of the weight we optimize.
dur: The label of the duration/time such that discount rates can be calculated.
maxIte: Max number of iterations. If the model does not satisfy the unichain assumption the algorithm may loop.
getLog: Output the log messages.

Value

The optimal gain (g) calculated.

See also