batch_outputs_time_major

class maze.train.trainers.impala.impala_batching.batch_outputs_time_major(actor_outputs: List[maze.train.parallelization.distributed_actors.actor.AgentOutput], learner_device: str)

Batch the collected output in time major format

Parameters
  • actor_outputs – A list of actor outputs (e.g. rollouts consisting of observations, actions_taken, infos, action_logtis, rewards and dones)

  • learner_device – the device (‘cpu’ or ‘cuda’) of the learner

Returns

An ActorOutput Named tuple where the the list of input rollouts has been batched in the second dim.