MultiStepActorCritic

class maze.train.trainers.common.actor_critic.actor_critic_trainer.MultiStepActorCritic(algorithm_config: Union[maze.train.trainers.a2c.a2c_algorithm_config.A2CAlgorithmConfig, maze.train.trainers.ppo.ppo_algorithm_config.PPOAlgorithmConfig], env: Union[maze.train.parallelization.distributed_env.distributed_env.BaseDistributedEnv, maze.core.env.structured_env.StructuredEnv, maze.core.env.structured_env_spaces_mixin.StructuredEnvSpacesMixin, maze.core.log_stats.log_stats_env.LogStatsEnv], eval_env: [<class 'maze.train.parallelization.distributed_env.distributed_env.BaseDistributedEnv'>, <class 'maze.core.env.structured_env.StructuredEnv'>, <class 'maze.core.env.structured_env_spaces_mixin.StructuredEnvSpacesMixin'>, <class 'maze.core.log_stats.log_stats_env.LogStatsEnv'>], model: maze.core.agent.torch_actor_critic.TorchActorCritic, model_selection: Optional[maze.train.trainers.common.model_selection.best_model_selection.BestModelSelection], initial_state: Optional[str] = None)

Base class for multi step actor critic.

Parameters
  • algorithm_config – Algorithm parameters.

  • env – Distributed structured environment

  • eval_env – Evaluation distributed structured environment

  • model – Structured torch actor critic model.

  • initial_state – path to initial state (policy weights, critic weights, optimizer state)

  • model_selection – Optional model selection class, receives model evaluation results.

evaluate(deterministic: bool, repeats: int)None

Perform evaluation on eval env.

Parameters
  • deterministic – deterministic or stochastic action sampling (selection)

  • repeats – number of evaluation episodes to average over

load_state(file_path: Union[str, BinaryIO])None

implementation of Trainer

load_state_dict(state_dict: Dict)None

Set the model and optimizer state. :param state_dict: The state dict.

train()None

Train policy using the synchronous advantage actor critic.