Policy¶
-
class
maze.core.agent.policy.
Policy
¶ Structured policy class designed to work with structured environments. (see
StructuredEnv
).It encapsulates policies and queries them for actions according to the provided policy ID.
-
abstract
compute_action
(observation: Dict[str, numpy.ndarray], maze_state: Optional[Any], policy_id: Union[str, int] = None, deterministic: bool = False) → Dict[str, Union[int, numpy.ndarray]]¶ Query a policy that corresponds to the given ID for action.
- Parameters
observation – Current observation of the environment
maze_state – Current state representation of the environment (only provided if needs_state() returns True)
policy_id – ID of the policy to query (does not have to be provided if policies dict contains only 1 policy)
deterministic – Specify if the action should be computed deterministically
- Returns
Next action to take
-
abstract
compute_top_action_candidates
(observation: Dict[str, numpy.ndarray], num_candidates: int, maze_state: Optional[Any], policy_id: Union[str, int] = None, deterministic: bool = False) → Tuple[Sequence[Dict[str, Union[int, numpy.ndarray]]], Sequence[float]]¶ Get the top :num_candidates actions as well as the probabilities, q-values, .. leading to the decision.
- Parameters
observation – Current observation of the environment
num_candidates – The number of actions that should be returned
maze_state – Current state representation of the environment (only provided if needs_state() returns True)
policy_id – ID of the policy to query (does not have to be provided if policies dict contains only 1 policy)
deterministic – Specify if the action should be computed deterministically
- Returns
a tuple of sequences, where the first sequence corresponds to the possible actions, the other sequence to the associated scores (e.g, probabilities or Q-values).
-
abstract
needs_state
() → bool¶ The policy implementation declares if it operates solely on observations (needs_state returns False) or if it also requires the state object in order to compute the action.
Note that requiring the state object comes with performance implications, especially in multi-node distributed workloads, where both objects would need to be transferred over the network.
-
abstract