Perform policy iteration (discounted reward criterion) on the MDP.
Source:R/mdp.R
runPolicyIteDiscount.Rd
The policy can afterwards be received using functions getPolicy
and getPolicyW
.
Usage
runPolicyIteDiscount(
mdp,
w,
dur,
rate = 0,
rateBase = 1,
discountFactor = NULL,
maxIte = 100,
discountMethod = "continuous",
getLog = TRUE
)
Arguments
- mdp
The MDP loaded using
loadMDP()
.- w
The label of the weight we optimize.
- dur
The label of the duration/time such that discount rates can be calculated.
- rate
The interest rate.
- rateBase
The time-horizon the rate is valid over.
- discountFactor
The discount rate for one time unit. If specified
rate
andrateBase
are not used to calculate the discount rate.- maxIte
Max number of iterations. If the model does not satisfy the unichain assumption the algorithm may loop.
- discountMethod
Either 'continuous' or 'discrete', corresponding to discount factor
exp(-rate/rateBase)
or1/(1 + rate/rateBase)
, respectively. Only used ifdiscountFactor
isNULL
.- getLog
Output the log messages.