Skip to contents

The RPO is defined as the difference between the weight of the state when using action i_a and the maximum weight of the node when using another predecessor different from i_a.

Usage

get_rpo(
  mdp,
  w,
  i_a,
  s_id = ifelse(mdp$time_horizon >= Inf, mdp$founder_states_last + 1,
    1):ifelse(mdp$time_horizon >= Inf, mdp$states + mdp$founder_states_last, mdp$states)
    - 1,
  criterion = "expected",
  dur = "",
  rate = 0,
  rate_base = 1,
  discount_factor = NULL,
  g = 0,
  objective = c("max", "min"),
  discount_method = "continuous",
  state_str = TRUE
)

Arguments

mdp

The MDP loaded using load_mdp().

w

The label of the weight we calculate RPO for.

i_a

The action index we calculate the RPO with respect to (same size as s_id).

s_id

Vector of id's of the states we want to retrieve.

criterion

The Bellman operator shortcut. If expected use expected weights, if discount use discounted expected weights, if average use average expected weights.

dur

The label of the duration/time such that discount rates can be calculated.

rate

The interest rate.

rate_base

The time-horizon the rate is valid over.

discount_factor

The discount rate for one time unit. If specified rate and rate_base are not used to calculate the discount rate.

g

The optimal gain (g) calculated (used if criterion = "average").

objective

Optimize by maximizing ("max") or minimizing ("min") the Bellman value.

discount_method

Either 'continuous' or 'discrete', corresponding to discount factor exp(-rate/rate_base) or 1/(1 + rate/rate_base), respectively. Only used if discount_factor is NULL.

state_str

Output the state string.

Value

The RPO (matrix/data frame).