Trainers and Training Runners

This page contains the reference documentation for trainers and training runners:

General

These are general interfaces, classes and utility functions for trainers and training runners:

Trainer

Interface for trainers.

TrainingRunner

Base class for training runner implementations.

TrainConfig

Top-level configuration structure.

ModelConfig

Model configuration structure.

AlgorithmConfig

Base class for all specific algorithm configurations.

ModelSelectionBase

Base class for model selection strategies.

BestModelSelection

Best model selection strategy.

Trainers

These are interfaces, classes and utility functions for built-in trainers:

Actor-Critics (AC)

MultiStepActorCritic

Base class for multi step actor critic.

MultiStepActorCriticEvents

Event interface, defining statistics emitted by the A2CTrainer.

MultiStepA2C

Multi step advantage actor critic.

A2CAlgorithmConfig

Algorithm parameters for multi-step A2C model.

MultiStepPPO

Multi step Proximal Policy Optimization.

PPOAlgorithmConfig

Algorithm parameters for multi-step PPO model.

MultiStepIMPALA

Multi step advantage actor critic.

ImpalaAlgorithmConfig

Algorithm parameters for Impala.

MultiStepIMPALAEvents

Events specific for the impala algorithm, in order to record and analyse it’s behaviour in more detail

ImpalaLearner

Learner agent for Impala.

batch_outputs_time_major

Batch the collected output in time major format

log_probs_from_logits_and_actions_and_spaces

Computes action log-probs from policy logits, actions and acton_spaces.

from_logits

V-trace for softmax policies.

from_importance_weights

V-trace from log importance weights.

get_log_rhos

With the selected log_probs for multi-discrete actions of behavior and target policies we compute the log_rhos for calculating the vtrace.

Evolutionary Strategies (ES)

ESTrainer

Trainer class for OpenAI Evolution Strategies.

ESAlgorithmConfig

Algorithm parameters for evolution strategies model.

ESEvents

Event interface, defining statistics emitted by the ESTrainer.

ESMasterRunner

Baseclass of ES training master runners (serves as basis for dev and other runners).

ESDevRunner

Runner config for single-threaded training, based on ESDummyDistributedRollouts.

SharedNoiseTable

A fixed length vector of deterministically generated pseudo-random floats.

Optimizer

Abstract baseclass of an optimizer to be used with ES.

SGD

Stochastic gradient descent with momentum

Adam

Adam optimizer

ESRolloutResult

Result structure for distributed rollouts.

ESDummyDistributedRollouts

Implementation of the ES distribution by running the rollouts synchronously in the same process.

ESDistributedRollouts

Abstract base class of ES rollout distribution.

ESAbortException

This exception is raised if the current rollout is intentionally aborted.

ESRolloutWorkerWrapper

The rollout generation is bound to a single worker environment by implementing it as a Wrapper class.

get_flat_parameters

Get the parameters of all sub-policies as a single flat vector.

set_flat_parameters

Overwrite the parameters of all sub-policies by a single flat vector.

Imitation Learning (IL) and Learning from Demonstrations (LfD)

ImitationEvents

Event interface defining statistics emitted by the imitation learning trainers.

ImitationEvaluator

Abstract interface for imitation learning evaluation.

ImitationRunner

Dev runner for imitation learning.

ParallelLoadedImitationDataset

A version of the in-memory dataset that loads all data in parallel.

DataLoadWorker

Data loading worker used to map states to actual observations.

InMemoryImitationDataSet

Trajectory data set for imitation learning.

BCTrainer

Trainer for behavioral cloning learning.

BCAlgorithmConfig

Algorithm parameters for behavioral cloning.

BCEvaluator

Evaluates a given policy on validation data.

BCLoss

Loss function for behavioral cloning.

Utilities

stack_numpy_dict_list

Stack list of dictionaries holding numpy arrays as values.

unstack_numpy_list_dict

Inverse of stack_numpy_dict_list().

compute_gradient_norm

Computes the cumulative gradient norm of all provided parameters.

stack_torch_dict_list

Stack list of dictionaries holding torch tensors as values.