MultiStepIMPALAEvents¶
-
class
maze.train.trainers.impala.impala_events.MultiStepIMPALAEvents¶ Events specific for the impala algorithm, in order to record and analyse it’s behaviour in more detail
-
critic_grad_norm(critic_key: Union[int, str], value: float)¶ Record the critic gradient norm
- Parameters
critic_key – the key of the critic
value – the value
-
critic_value(critic_key: Union[int, str], value: float)¶ Record the critic value
- Parameters
critic_key – the key of the critic
value – the value
-
critic_value_loss(critic_key: [<class 'int'>, <class 'str'>], value: float)¶ Record the critic value loss
- Parameters
critic_key – the key of the critic
value – the value
-
estimated_queue_sizes(before: int, after: int)¶ Record the estimated queue size before and after the collection of the actors output
- Parameters
before – the estimated queue size before collection
after – the estimated queue size after collection
-
policy_entropy(step_key: Union[int, str], value: float)¶ Record the policy entropy
- Parameters
step_key – the step_key of the multi-step env
value – the value
-
policy_grad_norm(step_key: Union[int, str], value: float)¶ Record the gradient norm
- Parameters
step_key – the step_key of the multi-step env
value – the value
-
policy_loss(step_key: Union[int, str], value: float)¶ Record the policy loss
- Parameters
step_key – the step_key of the multi-step env
value – the value
-
time_backprob(time: float, percent: float)¶ Record the total time it took the learner to backprob the loss + relative per to total update time
- Parameters
time – the absolute time it took for the computation
percent – the relative percentage this computation took w.r.t. to one update
-
time_collecting_actors(time: float, percent: float)¶ Record the total time it took the learner to collect the actors output + relative per to total update time
- Parameters
time – the absolute time it took for the computation
percent – the relative percentage this computation took w.r.t. to one update
-
time_dequeuing_actors(time: float, percent: float)¶ Record the time it took to dequeue the actors output from the synced queue + relative per to total update time
- Parameters
time – the absolute time it took for the computation
percent – the relative percentage this computation took w.r.t. to one update
-
time_learner_rollout(time: float, percent: float)¶ - Record the total time it took the learner to compute the logits on the agents output
relative per to total update time
- Parameters
time – the absolute time it took for the computation
percent – the relative percentage this computation took w.r.t. to one update
-