network#
Source code: armscan_env/network.py
- class ActorFactoryArmscanNet[source]#
A factory for creating MultiMod_ArmScan_Net actors for the armscan_env.
- class LabelmapsObsBatchProtocol(*args, **kwargs)[source]#
Batch protocol for the observation of the LabelmapSliceAsChannelsObservation class. Must have the same fields as the TDict of ChanneledLabelmapsObsWithActReward.
- action: ndarray#
- channeled_slice: ndarray#
- reward: ndarray#
- class MultiMod_ArmScan_Net(c: int, h: int, w: int, action_dim: int, n_stack: int, device: str | int | ~torch.device = 'cpu', mlp_output_dim: int = 512, layer_init: ~collections.abc.Callable[[~torch.nn.modules.module.Module], ~torch.nn.modules.module.Module] = <function layer_init>)[source]#
A composed network with a CNN for the channeled slice observation and an MLP for the action-reward observation. The CNN is composed of 3 convolutional layers with ReLU activation functions.
input: channeled slice observation,
first layer: 32 filters with kernel size 8 and stride 4,
second layer: 64 filters with kernel size 4 and stride 2,
third layer: 64 filters with kernel size 3 and stride 1.
output: flattened output.
The MLP is composed of 2 linear layers with ReLU activation functions.
input: last action and previous reward concatenated,
hidden layer: 512 units,
output layer: mlp_output_dim units.
The final processing MLP is composed of 3 linear layers with ReLU activation functions.
input: concatenation of the CNN and MLP outputs,
first layer: 512 units,
second layer: 512 units,
output layer: action_dim units.
- forward(obs: LabelmapsObsBatchProtocol, state: Any | None = None) tuple[Tensor, Any] [source]#
Mapping: s -> Q(s, *).
This method is used to generate the Q value from the given input data. * The channeled_slice observation is passed through a CNN, * The last action and previous reward are concatenated and passed through an MLP, * The outputs of the CNN and MLP are concatenated and passed through a final MLP. The output of the final MLP is the Q value of each action.