ESRolloutWorkerWrapper¶
-
class
maze.train.trainers.es.distributed.es_rollout_wrapper.ESRolloutWorkerWrapper(*args, **kwds)¶ The rollout generation is bound to a single worker environment by implementing it as a Wrapper class.
-
clear_abort()¶ Clear the abort flag.
-
generate_evaluation(policy: maze.core.agent.torch_policy.TorchPolicy) → maze.train.trainers.es.distributed.es_distributed_rollouts.ESRolloutResult¶ Generate a single evaluation rollout.
- Parameters
policy – Multi-step policy encapsulating the policy networks
:return A result set with a single evaluation rollout
-
generate_training(policy: maze.core.agent.torch_policy.TorchPolicy, noise_stddev: float) → maze.train.trainers.es.distributed.es_distributed_rollouts.ESRolloutResult¶ Generate a single training sample, consisting of two rollouts, obtained by adding and subtracting the same random perturbation vector from the policy.
- Parameters
policy – Multi-step policy encapsulating the policy networks.
noise_stddev – The standard deviation of the applied parameter noise.
- :return A result set with a pair of rollouts generated by adding/subtracting the perturbations
(antithetic sampling)
-
rollout(policy: maze.core.agent.torch_policy.TorchPolicy) → None¶ Use the passed policy to step the environment until it is done.
This method does not return any results, query the episode statistics instead to process the results.
- Parameters
policy – Multi-step policy encapsulating the policy networks
-
set_abort()¶ Abort the rollout (intended to be called from a thread).
-