整体架构:
由上图可以看出Swin-Unet主要由Swin Transformer Block,Patch Merging,Patch Expanding组成,其中左半部份就是Swin Transformer: Hierarchical Vision Transformer using Shifted Windows该篇论文的部分。Swin Transformer我在另一篇文章中有详细的解析与源码解读(Swin Transformer解读
Patch Expanding
该模块的作用是进行上采样,用于扩大分辨率,调整通道数。(其中最后一次上采样Patch Expanding分辨率扩大了4倍)。
class PatchExpand(nn.Module):
def __init__(self, input_resolution, dim, dim_scale=2, norm_layer=nn.LayerNorm):
super().__init__()
self.input_resolution = input_resolution
self.dim = dim
self.expand = nn.Linear(dim, 2*dim, bias=False) if dim_scale==2 else nn.Identity()
self.norm = norm_layer(dim // dim_scale)
def forward(self, x):
"""
x: B, H*W, c
"""
H, W = self.input_resolution
x = self.expand(x) #[B,H*W,2c]
B, L, C = x.shape
assert L == H * W, "input feature has wrong size"
x = x.view(B, H, W, C)
x = rearrange(x, 'b h w (p1 p2 c)-> b (h p1) (w p2) c', p1=2, p2=2, c=C//4) #[B,2H,2W,C//4]
x = x.view(B,-1,C//4) #[B,2H*2W,C//4]
x= self.norm(x)
return x
该操作实质上就是Patch Merging的逆操作,示意图如下所示。
实验结果
对图像进行32倍的下采样和上采样,对多组医学器官进行了相应的分割表明,基于纯SwinTransformer的Unet网络要优于那些全卷机或者Transformer和卷积的组合。