ImpalaAlgorithmConfig¶
-
class
maze.train.trainers.impala.impala_algorithm_config.ImpalaAlgorithmConfig(n_epochs: int, epoch_length: int, deterministic_eval: bool, eval_repeats: int, eval_concurrency: int, queue_out_of_sync_factor: float, patience: int, n_rollout_steps: int = 50, actors_batch_size: int = 2, num_actors: int = 2, lr: float = 0.0002, gamma: float = 0.98, policy_loss_coef: float = 1.0, value_loss_coef: float = 0.5, entropy_coef: float = 0.00025, max_grad_norm: float = 0, vtrace_clip_rho_threshold: float = 1.0, vtrace_clip_pg_rho_threshold: float = 1.0, reward_clipping: str = 'abs_one', device: str = 'cpu')¶ Algorithm parameters for Impala.
-
device: str = 'cpu'¶ Device of the learner (either cpu or cuda). Note that the actors collecting rollouts are always run on CPU.
-
queue_out_of_sync_factor: float¶ this factor multiplied by the actor_batch_size gives the size of the queue for the agents output collected by the learner. Therefor if the all rollouts computed can be at most (queue_out_of_sync_factor + num_agents/actor_batch_size) out of sync with learner policy
-
reward_clipping: str = 'abs_one'¶ the type of reward clipping to be used, options ‘abs_one’, ‘soft_asymmetric’, ‘None’
-