fusionlab.encoders.vit.vit module#
- class fusionlab.encoders.vit.vit.MLPBlock(hidden_size, mlp_dim, dropout_rate=0.0, act=<class 'torch.nn.modules.activation.GELU'>)[source]#
Bases:
ModuleA multi-layer perceptron block, based on: “Dosovitskiy et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>”
- __init__(hidden_size, mlp_dim, dropout_rate=0.0, act=<class 'torch.nn.modules.activation.GELU'>)[source]#
- Parameters:
hidden_size (
int) – dimension of hidden layer.mlp_dim (
int) – dimension of feedforward layer. If 0, hidden_size will be used.dropout_rate (
float) – faction of the input units to drop.act (
Module) – activation type and arguments. Defaults to nn.GELU
- forward(x)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- class fusionlab.encoders.vit.vit.TransformerBlock(hidden_size, mlp_dim, num_heads, dropout_rate=0.0, qkv_bias=False, save_attn=False)[source]#
Bases:
ModuleA transformer block, based on: “Dosovitskiy et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>”
- __init__(hidden_size, mlp_dim, num_heads, dropout_rate=0.0, qkv_bias=False, save_attn=False)[source]#
- Parameters:
hidden_size (int) – dimension of hidden layer.
mlp_dim (int) – dimension of feedforward layer.
num_heads (int) – number of attention heads.
dropout_rate (float, optional) – faction of the input units to drop. Defaults to 0.0.
qkv_bias (bool, optional) – apply bias term for the qkv linear layer. Defaults to False.
save_attn (bool, optional) – to make accessible the attention matrix. Defaults to False.
- forward(x)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- class fusionlab.encoders.vit.vit.ViT(in_channels, img_size, patch_size, hidden_size=768, mlp_dim=3072, num_layers=12, num_heads=12, pos_embed='conv', dropout_rate=0.0, spatial_dims=2, qkv_bias=False, save_attn=False)[source]#
Bases:
ModuleVision Transformer (ViT), based on: “Dosovitskiy et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>”
ViT supports Torchscript but only works for Pytorch after 1.8.
source code: Project-MONAI/MONAI
- __init__(in_channels, img_size, patch_size, hidden_size=768, mlp_dim=3072, num_layers=12, num_heads=12, pos_embed='conv', dropout_rate=0.0, spatial_dims=2, qkv_bias=False, save_attn=False)[source]#
- Parameters:
in_channels (int) – dimension of input channels.
img_size (Union[Sequence[int], int]) – dimension of input image.
patch_size (Union[Sequence[int], int]) – dimension of patch size.
hidden_size (int, optional) – dimension of hidden layer. Defaults to 768.
mlp_dim (int, optional) – dimension of feedforward layer. Defaults to 3072.
num_layers (int, optional) – number of transformer blocks. Defaults to 12.
num_heads (int, optional) – number of attention heads. Defaults to 12.
pos_embed (str, optional) – position embedding layer type. Defaults to “conv”.
num_classes (int, optional) – number of classes if classification is used. Defaults to 2.
dropout_rate (float, optional) – faction of the input units to drop. Defaults to 0.0.
spatial_dims (int, optional) – number of spatial dimensions. Defaults to 3.
qkv_bias (bool, optional) – apply bias to the qkv linear layer in self attention block. Defaults to False.
save_attn (bool, optional) – to make accessible the attention in self attention block. Defaults to False.
- forward(x, return_features=False)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#