Detailed explanation of U-Net network structure

Detailed explanation of U-Net network structure

The U-Net network structure is symmetrical. Because the network structure is U-shaped, it is named U-Net. Overall, U-Net is an Encoder-Decoder (encoder-decoder) structure, which is the same as the structure of FCN.

insert image description here

The left half of the network structure is feature extraction, and the right half is upsampling. Encoder is composed of convolution and downsampling. The convolution structure used is unified as a 3 3 convolution kernel. The convolution of Unet is valid convolution (not same). This is done so that the network only uses the input image information. If the same convolution is used, each 3x3 convolution will not change the size of the feature map, and the final upsampled size will be consistent with the input. However, padding will introduce errors, and the deeper the model is, the higher the degree of abstraction of the feature map will be, and the impact of padding will have a cumulative effect. So set the padding to 0; without padding, the size of the feature map will be reduced by 2 after each convolution. After four times of pooling, there are 5 scales in total for dimensionality reduction.
Decoder is used to restore the original resolution of the feature map. In addition to convolution, the main part of this part is upsampling (upsample) and skip connection (Skip connection). Upsampling is used to restore dimensionality. The layer-skip connection is to fuse the feature maps in the down-sampling process during the up-sampling process. The fusion method is concatent, and superimposed according to the number of feature map channels. Then predictive segmentation is performed according to the obtained feature map. In the actual process, the size of the fused feature map may be different, so it needs to be cropped. The last layer uses 1
1 convolution for classification.

Guess you like

Origin blog.csdn.net/AKxiaokui/article/details/125038738