ImpalaLearner¶

class maze.train.trainers.impala.impala_learner.ImpalaLearner(eval_env: Union[maze.train.parallelization.distributed_env.distributed_env.BaseDistributedEnv, maze.core.env.structured_env.StructuredEnv, maze.core.env.structured_env_spaces_mixin.StructuredEnvSpacesMixin, maze.core.log_stats.log_stats_env.LogStatsEnv], model: maze.core.agent.torch_actor_critic.TorchActorCritic, n_rollout_steps: int)¶

Learner agent for Impala. The agent only exists once (in the main thread) and is in charge of doing the loss computation as computing and backpropagating the gradients. Furthermore it holds critic network in contrast to the actors.

evaluate(deterministic: bool, repeats: int) → None ¶

Perform evaluation on eval env.

Parameters

deterministic – deterministic or stochastic action sampling (selection)
repeats – number of evaluation episodes to average over

learner_rollout_on_agent_output(actors_output: maze.train.parallelization.distributed_actors.actor.AgentOutput) → maze.train.trainers.impala.impala_learner.LearnerOutput¶

Compute the values and the action logits using the learners network parameters and the actors rollouts.: Thus we never step through an env here.

Parameters: actors_output – The collected and batched actors output, including the env_outputs such as observations and actions
Returns: A LearnerOutput names tuple consisting of (values, detached_values, actions_logits, n_critics)