Skip to contents

The policy can afterwards be received using functions get_policy and get_policy_w.

Usage

run_policy_ite_ave(
  mdp,
  w,
  dur,
  max_ite = 100,
  objective = c("max", "min"),
  get_log = TRUE
)

Arguments

mdp

The MDP loaded using load_mdp().

w

The label of the weight we optimize.

dur

The label of the duration/time such that discount rates can be calculated.

max_ite

Max number of iterations. If the model does not satisfy the unichain assumption the algorithm may loop.

objective

Optimize by maximizing ("max") or minimizing ("min") the Bellman value.

get_log

Output the log messages.

Value

The optimal gain (g) calculated.

See also