Trainers and Training Runners¶
This page contains the reference documentation for trainers and training runners:
Overview
General¶
These are general interfaces, classes and utility functions for trainers and training runners:
Interface for trainers. |
|
Base class for training runner implementations. |
|
Top-level configuration structure. |
|
Model configuration structure. |
|
Base class for all specific algorithm configurations. |
|
Base class for model selection strategies. |
|
Best model selection strategy. |
Trainers¶
These are interfaces, classes and utility functions for built-in trainers:
Actor-Critics (AC)¶
Base class for multi step actor critic. |
|
Event interface, defining statistics emitted by the A2CTrainer. |
|
Multi step advantage actor critic. |
|
Algorithm parameters for multi-step A2C model. |
|
Multi step Proximal Policy Optimization. |
|
Algorithm parameters for multi-step PPO model. |
|
Multi step advantage actor critic. |
|
Algorithm parameters for Impala. |
|
Events specific for the impala algorithm, in order to record and analyse it’s behaviour in more detail |
|
Learner agent for Impala. |
|
Batch the collected output in time major format |
|
Computes action log-probs from policy logits, actions and acton_spaces. |
|
V-trace for softmax policies. |
|
V-trace from log importance weights. |
|
With the selected log_probs for multi-discrete actions of behavior and target policies we compute the log_rhos for calculating the vtrace. |
Evolutionary Strategies (ES)¶
Trainer class for OpenAI Evolution Strategies. |
|
Algorithm parameters for evolution strategies model. |
|
Event interface, defining statistics emitted by the ESTrainer. |
|
Baseclass of ES training master runners (serves as basis for dev and other runners). |
|
Runner config for single-threaded training, based on ESDummyDistributedRollouts. |
|
A fixed length vector of deterministically generated pseudo-random floats. |
|
Abstract baseclass of an optimizer to be used with ES. |
|
Stochastic gradient descent with momentum |
|
Adam optimizer |
|
Result structure for distributed rollouts. |
|
Implementation of the ES distribution by running the rollouts synchronously in the same process. |
|
Abstract base class of ES rollout distribution. |
|
This exception is raised if the current rollout is intentionally aborted. |
|
The rollout generation is bound to a single worker environment by implementing it as a Wrapper class. |
|
Get the parameters of all sub-policies as a single flat vector. |
|
Overwrite the parameters of all sub-policies by a single flat vector. |
Imitation Learning (IL) and Learning from Demonstrations (LfD)¶
Event interface defining statistics emitted by the imitation learning trainers. |
|
Abstract interface for imitation learning evaluation. |
|
Dev runner for imitation learning. |
|
A version of the in-memory dataset that loads all data in parallel. |
|
Data loading worker used to map states to actual observations. |
|
Trajectory data set for imitation learning. |
|
Trainer for behavioral cloning learning. |
|
Algorithm parameters for behavioral cloning. |
|
Evaluates a given policy on validation data. |
|
Loss function for behavioral cloning. |
Utilities¶
Stack list of dictionaries holding numpy arrays as values. |
|
Inverse of |
|
Computes the cumulative gradient norm of all provided parameters. |
|
Stack list of dictionaries holding torch tensors as values. |