MultiStepIMPALAEvents

class maze.train.trainers.impala.impala_events.MultiStepIMPALAEvents

Events specific for the impala algorithm, in order to record and analyse it’s behaviour in more detail

critic_grad_norm(critic_key: Union[int, str], value: float)

Record the critic gradient norm

Parameters
  • critic_key – the key of the critic

  • value – the value

critic_value(critic_key: Union[int, str], value: float)

Record the critic value

Parameters
  • critic_key – the key of the critic

  • value – the value

critic_value_loss(critic_key: [<class 'int'>, <class 'str'>], value: float)

Record the critic value loss

Parameters
  • critic_key – the key of the critic

  • value – the value

estimated_queue_sizes(before: int, after: int)

Record the estimated queue size before and after the collection of the actors output

Parameters
  • before – the estimated queue size before collection

  • after – the estimated queue size after collection

policy_entropy(step_key: Union[int, str], value: float)

Record the policy entropy

Parameters
  • step_key – the step_key of the multi-step env

  • value – the value

policy_grad_norm(step_key: Union[int, str], value: float)

Record the gradient norm

Parameters
  • step_key – the step_key of the multi-step env

  • value – the value

policy_loss(step_key: Union[int, str], value: float)

Record the policy loss

Parameters
  • step_key – the step_key of the multi-step env

  • value – the value

time_backprob(time: float, percent: float)

Record the total time it took the learner to backprob the loss + relative per to total update time

Parameters
  • time – the absolute time it took for the computation

  • percent – the relative percentage this computation took w.r.t. to one update

time_collecting_actors(time: float, percent: float)

Record the total time it took the learner to collect the actors output + relative per to total update time

Parameters
  • time – the absolute time it took for the computation

  • percent – the relative percentage this computation took w.r.t. to one update

time_dequeuing_actors(time: float, percent: float)

Record the time it took to dequeue the actors output from the synced queue + relative per to total update time

Parameters
  • time – the absolute time it took for the computation

  • percent – the relative percentage this computation took w.r.t. to one update

time_learner_rollout(time: float, percent: float)
Record the total time it took the learner to compute the logits on the agents output
  • relative per to total update time

Parameters
  • time – the absolute time it took for the computation

  • percent – the relative percentage this computation took w.r.t. to one update

time_loss_computation(time: float, percent: float)

Record the total time it took the learner compute the loss + relative per to total update time

Parameters
  • time – the absolute time it took for the computation

  • percent – the relative percentage this computation took w.r.t. to one update