network#


class ActorFactoryArmscanNet[source]#

A factory for creating MultiMod_ArmScan_Net actors for the armscan_env.

create_module(envs: Environments, device: str | device) ActorProb[source]#

Creates a MultiMod_ArmScan_Net actor for the given environments.

class LabelmapsObsBatchProtocol(*args, **kwargs)[source]#

Batch protocol for the observation of the LabelmapSliceAsChannelsObservation class. Must have the same fields as the TDict of ChanneledLabelmapsObsWithActReward.

action: ndarray#
channeled_slice: ndarray#
reward: ndarray#
class MultiMod_ArmScan_Net(c: int, h: int, w: int, action_dim: int, n_stack: int, device: str | int | ~torch.device = 'cpu', mlp_output_dim: int = 512, layer_init: ~collections.abc.Callable[[~torch.nn.modules.module.Module], ~torch.nn.modules.module.Module] = <function layer_init>)[source]#

A composed network with a CNN for the channeled slice observation and an MLP for the action-reward observation. The CNN is composed of 3 convolutional layers with ReLU activation functions.

  • input: channeled slice observation,

  • first layer: 32 filters with kernel size 8 and stride 4,

  • second layer: 64 filters with kernel size 4 and stride 2,

  • third layer: 64 filters with kernel size 3 and stride 1.

  • output: flattened output.

The MLP is composed of 2 linear layers with ReLU activation functions.

  • input: last action and previous reward concatenated,

  • hidden layer: 512 units,

  • output layer: mlp_output_dim units.

The final processing MLP is composed of 3 linear layers with ReLU activation functions.

  • input: concatenation of the CNN and MLP outputs,

  • first layer: 512 units,

  • second layer: 512 units,

  • output layer: action_dim units.

forward(obs: LabelmapsObsBatchProtocol, state: Any | None = None) tuple[Tensor, Any][source]#

Mapping: s -> Q(s, *).

This method is used to generate the Q value from the given input data. * The channeled_slice observation is passed through a CNN, * The last action and previous reward are concatenated and passed through an MLP, * The outputs of the CNN and MLP are concatenated and passed through a final MLP. The output of the final MLP is the Q value of each action.

layer_init(layer: Module, std: float = 1.4142135623730951, bias_const: float = 0.0) Module[source]#

Initialize a layer with the given standard deviation and bias constant.