AgentIntegration¶

class maze.core.agent_integration.agent_integration.AgentIntegration(policy: maze.core.agent.policy.Policy, action_conversions: Dict[Union[str, int], maze.core.env.action_conversion.ActionConversionInterface], observation_conversions: Dict[Union[str, int], maze.core.env.observation_conversion.ObservationConversionInterface], num_candidates: int = 1, wrapper_types: Optional[List[Type[maze.core.wrappers.wrapper.Wrapper]]] = None, wrapper_kwargs: Optional[List[Dict[str, Any]]] = None, renderer: Optional[maze.core.rendering.renderer.Renderer] = None)¶

Encapsulates an agent, space interfaces and a stack of wrappers, to make the agent’s MazeActions accessible to an external env.

External env should supply states to agent integration object, and can query it for agent MazeActions. The agent with the supplied policy (or multiple policies) is run on a separate thread.

Note that the two threads (main thread running this wrapper and the second thread running the agent, wrappers etc.) never run in parallel, i.e. one is always suspended. This is enforced using the queues. Either the main thread runs and the agent thread is waiting for the state to be passed from the main thread, or the agent thread is running (computing the MazeAction) and the main thread is waiting until the MazeAction is passed back (then, the second thread is suspended again until the next state is passed in via the queue).

Queues have max size of one, enforcing that one step can be taken at a time.

Parameters

policy – Structured policy working with structured environments. When querying for MazeAction, it can be specified what policy should be run (using the actor_id parameter, first part of which corresponds to the policy_id).
action_conversions – Action conversion interfaces for the respective policies.
observation_conversions – Observation interfaces for the respective policies.
num_candidates – Number of MazeAction candidates to get from the policy. If greater than 1, will return multiple MazeActions wrapped in MazeActionCandidates
wrapper_types – Which wrappers should be run as part of the agent’s stack.
wrapper_kwargs – Optional arguments to pass to the given wrappers on instantiation.

finish_rollout(maze_state: Any, reward: Union[float, numpy.ndarray, Any], done: bool, info: Dict[Any, Any], events: Optional[List[maze.core.events.event_record.EventRecord]] = None)¶

Should be called when the rollout is finished. While this has no effect on the provided MazeActions, it passes an env reset call through the wrapper stack, enabling the wrappers to do any work they normally do at the end of an episode (like write trajectory data).

Parameters

maze_state – Final state of the rollout
reward – Reward for the previous step (can be null in initial step)
done – Whether the external environment is done
info – Info dictionary
events – List of events to be recorded for this step (mainly useful for statistics and event logs)

get_maze_action(maze_state: Any, reward: Union[None, float, numpy.ndarray, Any], done: bool, info: Union[None, Dict[Any, Any]], events: Optional[List[maze.core.events.event_record.EventRecord]] = None, actor_id: Tuple[Union[str, int], int] = 0, 0)¶

Query the agent for MazeAction derived from the given state.

Passes the state etc. to the agent’s thread, where it is integrated into an ordinary env rollout loop. In the first step, an env reset call is propagated through the env wrapper stack on agent’s thread.

Parameters

maze_state – Current state of the environment.
reward – Reward for the previous step (can be null in initial step)
done – Whether the external environment is done
info – Info dictionary
events – List of events to be recorded for this step (mainly useful for statistics and event logs)
actor_id – Optional ID of the actor to run next (comprised of policy_id and agent_id)

Returns

MazeAction from the agent