Trainers and Training Runners¶

This page contains the reference documentation for trainers and training runners:

Overview

General ¶

These are general interfaces, classes and utility functions for trainers and training runners:

`Trainer`	Interface for trainers.
`TrainingRunner`	Base class for training runner implementations.
`TrainConfig`	Top-level configuration structure.
`ModelConfig`	Model configuration structure.
`AlgorithmConfig`	Base class for all specific algorithm configurations.
`ModelSelectionBase`	Base class for model selection strategies.
`BestModelSelection`	Best model selection strategy.

These are interfaces, classes and utility functions for built-in trainers:

`MultiStepActorCritic`	Base class for multi step actor critic.
`MultiStepActorCriticEvents`	Event interface, defining statistics emitted by the A2CTrainer.
`MultiStepA2C`	Multi step advantage actor critic.
`A2CAlgorithmConfig`	Algorithm parameters for multi-step A2C model.
`MultiStepPPO`	Multi step Proximal Policy Optimization.
`PPOAlgorithmConfig`	Algorithm parameters for multi-step PPO model.
`MultiStepIMPALA`	Multi step advantage actor critic.
`ImpalaAlgorithmConfig`	Algorithm parameters for Impala.
`MultiStepIMPALAEvents`	Events specific for the impala algorithm, in order to record and analyse it’s behaviour in more detail
`ImpalaLearner`	Learner agent for Impala.
`batch_outputs_time_major`	Batch the collected output in time major format
`log_probs_from_logits_and_actions_and_spaces`	Computes action log-probs from policy logits, actions and acton_spaces.
`from_logits`	V-trace for softmax policies.
`from_importance_weights`	V-trace from log importance weights.
`get_log_rhos`	With the selected log_probs for multi-discrete actions of behavior and target policies we compute the log_rhos for calculating the vtrace.

`ESTrainer`	Trainer class for OpenAI Evolution Strategies.
`ESAlgorithmConfig`	Algorithm parameters for evolution strategies model.
`ESEvents`	Event interface, defining statistics emitted by the ESTrainer.
`ESMasterRunner`	Baseclass of ES training master runners (serves as basis for dev and other runners).
`ESDevRunner`	Runner config for single-threaded training, based on ESDummyDistributedRollouts.
`SharedNoiseTable`	A fixed length vector of deterministically generated pseudo-random floats.
`Optimizer`	Abstract baseclass of an optimizer to be used with ES.
`SGD`	Stochastic gradient descent with momentum
`Adam`	Adam optimizer
`ESRolloutResult`	Result structure for distributed rollouts.
`ESDummyDistributedRollouts`	Implementation of the ES distribution by running the rollouts synchronously in the same process.
`ESDistributedRollouts`	Abstract base class of ES rollout distribution.
`ESAbortException`	This exception is raised if the current rollout is intentionally aborted.
`ESRolloutWorkerWrapper`	The rollout generation is bound to a single worker environment by implementing it as a Wrapper class.
`get_flat_parameters`	Get the parameters of all sub-policies as a single flat vector.
`set_flat_parameters`	Overwrite the parameters of all sub-policies by a single flat vector.

`ImitationEvents`	Event interface defining statistics emitted by the imitation learning trainers.
`ImitationEvaluator`	Abstract interface for imitation learning evaluation.
`ImitationRunner`	Dev runner for imitation learning.
`ParallelLoadedImitationDataset`	A version of the in-memory dataset that loads all data in parallel.
`DataLoadWorker`	Data loading worker used to map states to actual observations.
`InMemoryImitationDataSet`	Trajectory data set for imitation learning.
`BCTrainer`	Trainer for behavioral cloning learning.
`BCAlgorithmConfig`	Algorithm parameters for behavioral cloning.
`BCEvaluator`	Evaluates a given policy on validation data.
`BCLoss`	Loss function for behavioral cloning.

`stack_numpy_dict_list`	Stack list of dictionaries holding numpy arrays as values.
`unstack_numpy_list_dict`	Inverse of `stack_numpy_dict_list()`.
`compute_gradient_norm`	Computes the cumulative gradient norm of all provided parameters.
`stack_torch_dict_list`	Stack list of dictionaries holding torch tensors as values.