ESAlgorithmConfig¶

class maze.train.trainers.es.es_algorithm_config.ESAlgorithmConfig(n_rollouts_per_update: int, n_timesteps_per_update: int, max_epochs: int, max_steps: int, optimizer: Any, l2_penalty: float, noise_stddev: float)¶

Algorithm parameters for evolution strategies model.

l2_penalty: float¶: L2 weight regularization coefficient.

max_epochs: int¶: The number of epochs to train before termination. Pass 0 to train indefinitely.

max_steps: int¶: Limit the episode rollouts to a maximum number of steps. Set to 0 to disable this option.

n_rollouts_per_update: int¶: Minimum number of episode rollouts per training iteration (=epoch).

n_timesteps_per_update: int¶: Minimum number of cumulative env steps per training iteration (=epoch). The training iteration is only finished, once the given number of episodes AND the given number of steps has been reached. One of the two parameters can be set to 0.

noise_stddev: float¶: The scaling factor of the random noise applied during training.

optimizer: Any¶: The optimizer to use to update the policy based on the sampled gradient.