ImpalaLearner¶
-
class
maze.train.trainers.impala.impala_learner.ImpalaLearner(eval_env: Union[maze.train.parallelization.distributed_env.distributed_env.BaseDistributedEnv, maze.core.env.structured_env.StructuredEnv, maze.core.env.structured_env_spaces_mixin.StructuredEnvSpacesMixin, maze.core.log_stats.log_stats_env.LogStatsEnv], model: maze.core.agent.torch_actor_critic.TorchActorCritic, n_rollout_steps: int)¶ Learner agent for Impala. The agent only exists once (in the main thread) and is in charge of doing the loss computation as computing and backpropagating the gradients. Furthermore it holds critic network in contrast to the actors.
-
evaluate(deterministic: bool, repeats: int) → None¶ Perform evaluation on eval env.
- Parameters
deterministic – deterministic or stochastic action sampling (selection)
repeats – number of evaluation episodes to average over
-
learner_rollout_on_agent_output(actors_output: maze.train.parallelization.distributed_actors.actor.AgentOutput) → maze.train.trainers.impala.impala_learner.LearnerOutput¶ - Compute the values and the action logits using the learners network parameters and the actors rollouts.
Thus we never step through an env here.
- Parameters
actors_output – The collected and batched actors output, including the env_outputs such as observations and actions
- Returns
A LearnerOutput names tuple consisting of (values, detached_values, actions_logits, n_critics)
-