ImpalaLearner

class maze.train.trainers.impala.impala_learner.ImpalaLearner(eval_env: Union[maze.train.parallelization.distributed_env.distributed_env.BaseDistributedEnv, maze.core.env.structured_env.StructuredEnv, maze.core.env.structured_env_spaces_mixin.StructuredEnvSpacesMixin, maze.core.log_stats.log_stats_env.LogStatsEnv], model: maze.core.agent.torch_actor_critic.TorchActorCritic, n_rollout_steps: int)

Learner agent for Impala. The agent only exists once (in the main thread) and is in charge of doing the loss computation as computing and backpropagating the gradients. Furthermore it holds critic network in contrast to the actors.

evaluate(deterministic: bool, repeats: int)None

Perform evaluation on eval env.

Parameters
  • deterministic – deterministic or stochastic action sampling (selection)

  • repeats – number of evaluation episodes to average over

learner_rollout_on_agent_output(actors_output: maze.train.parallelization.distributed_actors.actor.AgentOutput) → maze.train.trainers.impala.impala_learner.LearnerOutput
Compute the values and the action logits using the learners network parameters and the actors rollouts.

Thus we never step through an env here.

Parameters

actors_output – The collected and batched actors output, including the env_outputs such as observations and actions

Returns

A LearnerOutput names tuple consisting of (values, detached_values, actions_logits, n_critics)