[Introduction to Target Detection] Anchor Frame Priori Frame

Anchor Box/Prior Box

What is a priori box

A priori frame is a series of detection frames set in advance (the size and dimensions are all set in advance)

Why set a priori box? A basic idea of ​​target detection was introduced in the previous basic concepts of target detection: first establish a large number of candidate frames, and then classify and fine-tune the candidate frames to complete target detection. This is the reason for setting a priori box.

Set a priori boxes of different scales

In order to cover more possible situations, in the same position in the figure, several a priori boxes of different scales will be set. The different scales here refer to the size and aspect ratio.
Insert picture description here
Obviously, setting a priori boxes of different scales will appear to match the target. The object is more matched (high IoU) a priori box has a higher probability

The prior box corresponds to the feature map

In addition, the a priori box should be set at different positions in the picture.

If traversing all pixels in the original image, assuming a 224x224 picture, 3 a priori boxes are set for each pixel position, a total of 224x224x3=150528 a priori boxes need to be set. If it is a feature map obtained by traversing the original image and sampling, it will greatly reduce the number of a priori boxes.
Insert picture description here

Determination of the category information of a priori box

The category information of the set a priori box must be given so that the model can learn to predict whether each a priori box corresponds to a target object.

For the determination of category information, the method is to set an IoU threshold, and the a priori box category with the target frame IoU less than the threshold is set as background; and the a priori frame with the target frame IoU greater than or equal to the threshold is set as the target prior frame. In this way, the ground truth category information is obtained.
Insert picture description here

Generation of a priori box

The size of the feature map obtained by VGG16 is 7*7, and there are 9 candidate frames with three scales and three aspect ratios set for each position. Take the code as an example:

"""
设置细节介绍:
1. 离散程度 fmap_dims = 7: VGG16最后的特征图尺寸为 7*7
2. 在上面的举例中我们是假设了三种尺寸的先验框,然后遍历坐标。在先验框生成过程中,先验框的尺寸是提前设置好的,
   特征图上每一个cell定义了共9种不同大小和形状的候选框(3种尺度*3种长宽比=9)

生成过程:
0. cx, cy表示中心点坐标
1. 遍历特征图上每一个cell,i+0.5是为了从坐标点移动至cell中心,/fmap_dims目的是将坐标在特征图上归一化
2. 这个时候我们已经可以在每个cell上各生成一个框了,但是这个不是我们需要的,我们称之为base_prior_bbox基准框。
3. 根据我们在每个cell上得到的长宽比1:1的基准框,结合我们设置的3种尺度obj_scales和3种长宽比aspect_ratios就得到了每个cell的9个先验框。
4. 最终结果保存在prior_boxes中并返回。

需要注意的是,这个时候我们的到的先验框是针对特征图的尺寸并归一化的,因此要映射到原图计算IOU或者展示,需要:
img_prior_boxes = prior_boxes * 图像尺寸
"""

def create_prior_boxes():
        """
        VGG16最后的特征图尺寸为 7*7
        特征图上每一个cell定义了共9种不同大小和形状的候选框(3种尺度*3种长宽比=9)
        因此总的候选框个数 = 7 * 7 * 9 = 441
        :return: prior boxes in center-size coordinates, a tensor of dimensions (441, 4)
        """
        fmap_dims = 7 
        obj_scales = [0.2, 0.4, 0.6] 
        aspect_ratios = [1., 2., 0.5] 

        prior_boxes = []
        for i in range(fmap_dims):
            for j in range(fmap_dims):
                cx = (j + 0.5) / fmap_dims  # 归一化cell中心点x坐标
                cy = (i + 0.5) / fmap_dims # 归一化cell中心点y坐标

                for obj_scale in obj_scales:
                    for ratio in aspect_ratios:
                        prior_boxes.append([cx, cy, obj_scale * sqrt(ratio), obj_scale / sqrt(ratio)]) #直接用sqrt(),这里假设原图宽高相同

        prior_boxes = torch.FloatTensor(prior_boxes).to(device)  # (441, 4)
        prior_boxes.clamp_(0, 1)  # (441, 4)

        return prior_boxes

To visualize:

1.obj_scales = [0.1,0.2,0.3]
Insert picture description here
2.obj_scales = [0.2,0.4,0.6] When the a
Insert picture description here
priori box exceeds the picture limit, the picture size is generally used to cut off the out-of-bounds a priori box, such as the upper left of a certain a priori box The corner coordinates are (-5, -9), then it is truncated to (0, 0), and the lower right corner coordinates of a priori box are (324, 134), when the picture size is (224, 224), it will be cut Truncate to (224,134).

prior_boxes.clamp_(0, 1)

The above line implements the function of truncation in the code (normalized, using 0-1 truncation)

reference

Hands-on learning CV-Pytorch

Guess you like

Origin blog.csdn.net/i0o0iW/article/details/111409885