RandomPolicy

class maze.core.agent.random_policy.RandomPolicy(action_spaces_dict: Dict[Union[str, int], gym.spaces.Space])

Implements a random structured policy.

Parameters

action_spaces_dict – The action_spaces dict from the env

compute_action(observation: Dict[str, numpy.ndarray], maze_state: Optional[Any], policy_id: Union[str, int] = None, deterministic: bool = False) → Dict[str, Union[int, numpy.ndarray]]

Query a policy that corresponds to the given ID for action.

Parameters
  • observation – Current observation of the environment

  • maze_state – Current state of the environment (will always be None as needs_state() returns False)

  • policy_id – ID of the policy to query (does not have to be provided if policies dict contain only 1 policy

  • deterministic – Specify if the action should be computed deterministically

Returns

Next action to take

compute_top_action_candidates(observation: Dict[str, numpy.ndarray], num_candidates: int, maze_state: Optional[Any] = None, policy_id: Union[str, int] = None, deterministic: bool = False) → Tuple[Sequence[Dict[str, Union[int, numpy.ndarray]]], Sequence[float]]

implementation of Policy interface

needs_state()bool

This policy does not require the state() object to compute the action.