MultiStepA2C

class maze.train.trainers.a2c.a2c_trainer.MultiStepA2C(algorithm_config: maze.train.trainers.a2c.a2c_algorithm_config.A2CAlgorithmConfig, env: Union[maze.train.parallelization.distributed_env.distributed_env.BaseDistributedEnv, maze.core.env.structured_env.StructuredEnv, maze.core.env.structured_env_spaces_mixin.StructuredEnvSpacesMixin, maze.core.log_stats.log_stats_env.LogStatsEnv], eval_env: [<class 'maze.train.parallelization.distributed_env.distributed_env.BaseDistributedEnv'>, <class 'maze.core.env.structured_env.StructuredEnv'>, <class 'maze.core.env.structured_env_spaces_mixin.StructuredEnvSpacesMixin'>, <class 'maze.core.log_stats.log_stats_env.LogStatsEnv'>], model: maze.core.agent.torch_actor_critic.TorchActorCritic, model_selection: Optional[maze.train.trainers.common.model_selection.best_model_selection.BestModelSelection], initial_state: Optional[str] = None)

Multi step advantage actor critic.

Parameters
  • algorithm_config – Algorithm parameters.

  • env – Distributed structured environment

  • eval_env – Evaluation distributed structured environment

  • model – Structured torch actor critic model.

  • initial_state – path to initial state (policy weights, critic weights, optimizer state)

  • model_selection – Optional model selection class, receives model evaluation results.