MMDetection Framework Getting Started Tutorial (2): Quick Start Tutorial

  I came from Tensorflow, and I was a little confused when I first came into contact with the MMDetection framework, because this framework encapsulates several layers on the basis of Pytorch. The advantage of this is that the coupling between modules is very low, and it is very convenient to change, but The downside is that for a novice like me (who doesn’t know much about Pytorch), it’s hard to understand the running process of the entire framework at first glance, and I don’t even know how to view the corresponding source code, let alone build my own network from scratch.

  After searching a lot on the Internet, I strongly recommend this explanation video from Xi'an Jiaotong University at station B. It is very friendly to novices and explains how to use MMDetection very clearly. This article combines the tutorials of the official OpenMMLab official account to summarize and expand the video content, hoping to help beginners who are as confused as me.

  1. B station - mmdetection tutorial
  2. Zhihu-Easy to master the overall construction process of MMDetection (1)
  3. Zhihu-Easy to master the overall construction process of MMDetection (2)
  4. Zhihu-Easy to master the commonly used algorithms in MMDetection (1): RetinaNet and detailed configuration
  5. Official documentation - MMDetection Tutorial

1. What is MMDetection?

  MMDetection is a member of the OpenMMLab family, mainly responsible for the field of 2D target detection (for example, MMDetection3D is responsible for 3D target detection). First of all, we need to know why the MMDetection framework appears. At present, there are many target detection algorithms, the methods are complicated, and there are many details. It is very difficult for individuals to reproduce. Moreover, due to the lack of shared platforms and unified specifications, even if someone successfully implements a certain algorithm, it is difficult for others to reuse it.

  So SenseTime and the Chinese University of Hong Kong gathered a group of people to use a unified code specification to reproduce most of the current mainstream and cutting-edge models, such as the Faster R-CNN series, the YOLO series, and the newer DETR, etc. (as shown in the figure below) shown), and provides a pre-trained model. Others only need to follow this specification, and they can directly "prostitute" without having to re-implement it again, and this specification is MMDetection. On the basis of rich models, MMDetection also supports custom extensions, which can be modified on existing models, or build a brand new model from scratch, which can basically meet the needs of academic research and industrial implementation.

2. Overall algorithm flow

  All target detection algorithms can be abstracted into several modules according to the training and testing process. For beginners, it is enough to understand the input and output of each module and the functions realized. This blog will not expand, the implementation logic inside the module After that, I will open a separate blog analysis. This process also corresponds to the code construction process of the framework, so it is very important to understand this picture.

2.1 Training process

  The training process contains 9 core components, but not every algorithm needs them, as shown in the table below.

module name required Function
Backbone yes [Feature extraction] such as ResNet series
Neck no [Feature Enhancement] Fusion and enhancement of Backbone's features, such as FPN
Head yes [Feature decoding] The most important part of the target detection network, Head decodes the feature map to obtain the output expected by the algorithm, such as the category and coordinates of the target frame, which can be divided into DenseHead and RoIHead according to one-stage and two-stage
BBox Assign yes [Positive and negative sample allocation] Since the number of targets output by the detection network does not match the true target in most cases, positive and negative sample allocation must be performed first, and different positive and negative sample allocation strategies will bring significant performance differences , this module is critical
BBox Sampler no [Positive and negative sample balance] In general target detection, the number of true value targets is very small, and the ratio of positive and negative samples is far less than 1. In order to avoid overfitting caused by extreme data imbalance, it is necessary to properly sample positive and negative samples. Balance the number of positive and negative samples
BBox Encoder yes [Encoding transformation] In order to better converge and balance the loss, the network output results are subjected to specific encoding transformations, such as normalization, and the output of the Encoder can be considered as the final output of the forward process of the model.
Loss yes [Loss calculation] The detection network is generally divided into classification loss and regression loss, which provide the basis for model iterative optimization
Enhance no [Feature Enhancement] Generally refers to plug-and-play modules that can enhance features, such as Dropout, Dropblock, etc.
Training Tricks no 【Training skills】It is the well-known model tuning method, such as early stopping, learning rate adjustment, etc.

2.2 Test process

  Compared with the training process, there is only the forward reasoning process of the model during the test, so operations such as positive and negative sample allocation, balancing, and loss calculation are not required, and the process will be simpler. The following table is the modules specific to the test flow.

module name required Function
BBox Decoder yes [Decoding Transformation] Corresponding to the BBox Encoder module in the test process, how to encode the target during training and how to decode it during testing
BBox PostProcess yes [Post-processing] After the target frame is obtained, there may be overlaps, so it is generally necessary to process the output target according to the IOU or confidence. The most commonly used method is the NMS method.
Training Tricks no [Testing skills] such as model integration, multi-scale testing, etc.

3. Algorithm construction process

  Taking the training process as an example, for Tensorflow and Pytorch, we need to write data reading, data preprocessing, data enhancement, algorithm model, loss function, training strategy code, and finally integrate it into the train() function to start training. The process Very cumbersome. Since MMDetection has implemented most of the methods in the above steps, we only need to call the ready-made functions, specifically configure the parameters of the corresponding methods in the Config file, and pass the Config file to the functions that come with MMDetection, and train()then The framework will parse the Config file, automatically call the configured method, and complete the training process. So to build an algorithm on MMDetection, there are only three things we have to do: prepare the data set, write the Config file, and call the framework that comes with it to start training train.py.

  Let's start with the RetinaNet that comes with MMDetection, and get through the training and testing process on the COCO dataset.

3.1 Prepare the dataset

  MMDetection has realized the processing of COCO dataset, we will directly use COCO 2014 dataset here. The directory structure of the downloaded dataset is shown in the figure below. The annotations folder stores annotation data in json file format, and the annotation information of the target box is in the instances file.

3.2 Write Config file

  The configuration file of RetinaNet is located in the path of the MMDetection source code ./configs/retinanet. Open the directory and you will find that there are many Config files in it. The file naming rules follow:

{
    
    model}_[model setting]_{
    
    backbone}_{
    
    neck}_[norm setting]_[misc]_[gpu x batch_per_gpu]_{
    
    schedule}_{
    
    dataset}

Curly brackets indicate mandatory, and curly brackets indicate optional. For example, the meaning of the configuration file we will use next retinanet_r50_fpn_1x_coco.pyis: model name is RetinaNet, backbone is ResNet50, Neck is FPN, training 12 Epoch (1 x is 12, 2 x is 24), using COCO dataset. More detailed field descriptions can be found in the official documentation .

  But when we opened the configuration file retinanet_r50_fpn_1x_coco.py, we found that there were only a few lines of code:

_base_ = [
    '../_base_/models/retinanet_r50_fpn.py',
    '../_base_/datasets/coco_detection.py',
    '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py'
]
# optimizer
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)

  In fact, the configuration file in MMDetection 继承 + 修改completes the user-defined configuration file through the method. The beginning of the configuration file _base_ = list()indicates the configuration file you need to inherit, and then complete the modification of the corresponding properties by rewriting . If you want to view the complete configuration file information, you don’t need to search up level by level. You can print the configuration file _base_through the official tools :./tools/misc/print_config.py

python ./tools/misc/print_config.py ./configs/retinanet/retinanet_r50_fpn_1x_coco.py

  Then you can see retinanet_r50_fpn_1x_coco.pythe corresponding complete configuration file content, and each module mentioned in the second section can find the corresponding definition in the configuration file. The configuration file consists of a string dictof dictionaries and variable definitions. After Config.fromfile(filepath)being loaded by the function, a type of variable (a data structure of MMCV) will be returned Config, and then the MMDetection framework can build the corresponding module according to the method Configrelated to this call .build_detector()

  Specifically, the method will first find the corresponding class (Class) build_detector()according to the dictionary . The class name of this class is the value of the string, and this class must be registered in advance (Registry) . MMDetection can query the specific class according to the value. class, otherwise an error will be reported. For example, in the following configuration file, the value of model is RetinaNet, and we can find the definition in it.typetypetypetype./mmdet/models/detectors/retinanet.py

@DETECTORS.register_module()		# 表示这个类已经注册
class RetinaNet(SingleStageDetector):
    """Implementation of `RetinaNet <https://arxiv.org/abs/1708.02002>`_"""

    def __init__(self,
                 backbone,
                 neck,
                 bbox_head,
                 train_cfg=None,
                 test_cfg=None,
                 pretrained=None,
                 init_cfg=None):
        super(RetinaNet, self).__init__(backbone, neck, bbox_head, train_cfg,
                                        test_cfg, pretrained, init_cfg)

  We can find that the parameters of the RetinaNet class constructor just type='RetinaNet'correspond to the other key values ​​of the dictionary in the configuration file. So the function of the function is to find the corresponding class build_detector()according to the dict , then use the parameters passed in the dict to initialize the class, and return the handle of this class.type

# 下面两行调用是等价的
model = build_detector(Config{
    
    type='RetinaNet', backbone=xxx, neck=xxx, bbox_head=xxx})
model = RetinaNet(backbone=xxx, neck=xxx, bbox_head=xxx)

  Then the dict in the configuration file can be nested. For example, the backbone attribute of the model is type='ResNet'a dictionary. Similarly, we can also ./mmdet/models/backbones/resnet.pyfind the definition of the ResNet class in , and the key value of the dictionary matches the constructor.

@BACKBONES.register_module()
class ResNet(BaseModule):
    """ResNet backbone."""

    def __init__(self,
                 depth,
                 in_channels=3,
                 stem_channels=None,
                 base_channels=64,
                 num_stages=4,
                 strides=(1, 2, 2, 2),
                 dilations=(1, 1, 1, 1),
                 out_indices=(0, 1, 2, 3),
                 style='pytorch',
                 deep_stem=False,
                 avg_down=False,
                 frozen_stages=-1,
                 conv_cfg=None,
                 norm_cfg=dict(type='BN', requires_grad=True),
                 norm_eval=True,
                 dcn=None,
                 stage_with_dcn=(False, False, False, False),
                 plugins=None,
                 with_cp=False,
                 zero_init_residual=True,
                 pretrained=None,
                 init_cfg=None):
        super(ResNet, self).__init__(init_cfg)
        self.zero_init_residual = zero_init_residual
        if depth not in self.arch_settings:
            raise KeyError(f'invalid depth {
      
      depth} for resnet')

  Below is retinanet_r50_fpn_1x_coco.pythe complete profile information.

Config:
# 1. 模型配置
model = dict(
    type='RetinaNet',		# 模型名称
    # 1.1 Backbone配置
    backbone=dict(
        type='ResNet',		# Backbone使用ResNet50(4阶段,50层)
        depth=50,			
        num_stages=4,
        out_indices=(0, 1, 2, 3),	# 输出ResNet50第1~4阶段的feature map,供后续FPN做多尺度特征融合
        frozen_stages=1,			# 由于使用了预训练模型,冻结ResNet50第一阶段的网络参数,不参与训练过程
        norm_cfg=dict(type='BN', requires_grad=True),	# 归一化层配置
        norm_eval=True,
        style='pytorch',
        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),		# 使用pytorch提供的ResNet50在ImageNet上的预训练模型
    # 1.2 Neck配置
    neck=dict(
        type='FPN',		# Neck使用FPN
        in_channels=[256, 512, 1024, 2048],		# 输入通道数对应resnet50四个阶段feature map的维度
        out_channels=256,						# 输出特征维度为256
        start_level=1,							# 从Backbone的第一阶段特征图开始
        add_extra_convs='on_input',
        num_outs=5),
    # 1.3 Head配置
    bbox_head=dict(
        type='RetinaHead',	# Head使用RetinaHead
        num_classes=80,		# COCO数据集包含80类目标
        in_channels=256,	# FPN层输出特征维度为256
        stacked_convs=4,
        feat_channels=256,
        # 1.3.1 Retina是Anchor-Based方法, 需要生成Anchor
        anchor_generator=dict(
            type='AnchorGenerator',
            octave_base_scale=4,
            scales_per_octave=3,
            ratios=[0.5, 1.0, 2.0],
            strides=[8, 16, 32, 64, 128]),
        # 1.3.2 BBox Encoder配置
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        # 1.3.3 分类Loss函数
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        # 1.3.4 回归Loss函数
        loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
    # 1.4 训练配置
    train_cfg=dict(
    	# 1.4.1 BBox Assigner
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.5,
            neg_iou_thr=0.4,
            min_pos_iou=0,
            ignore_iof_thr=-1),
        allowed_border=-1,
        pos_weight=-1,
        debug=False),
    # 1.5 测试配置
    test_cfg=dict(
        nms_pre=1000,
        min_bbox_size=0,
        score_thr=0.05,
        nms=dict(type='nms', iou_threshold=0.5),
        max_per_img=100))

# 2. 数据配置
data = dict(
    samples_per_gpu=2,		# batch_size大小
    workers_per_gpu=2,		# 每个GPU的线程数, 影响dataload的速度
    # 2.1 训练集配置
    train=dict(
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_train2017.json',			
        img_prefix='data/coco/train2017/',									
        # 数据预处理步骤
        pipeline=[
            dict(type='LoadImageFromFile'),									
            dict(type='LoadAnnotations', with_bbox=True),					
            dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),	
            dict(type='RandomFlip', flip_ratio=0.5),						
            dict(															
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),								
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
        ]),
    # 2.2 验证集配置
    val=dict(
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_val2017.json',
        img_prefix='data/coco/val2017/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    # 2.3 测试集配置
    test=dict(
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_val2017.json',
        img_prefix='data/coco/val2017/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
# evalution hook配置
evaluation = dict(interval=1, metric='bbox')
# 优化器配置
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
# optimizer hook配置
optimizer_config = dict(grad_clip=None)
# 学习率配置
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[8, 11])
# Runner配置
runner = dict(type='EpochBasedRunner', max_epochs=12)
# checkpoint配置
checkpoint_config = dict(interval=1)
# logger hook配置
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
# 自定义hook配置
custom_hooks = [dict(type='NumClassCheckHook')]
# 分布式训练配置
dist_params = dict(backend='nccl')
# 日志级别
log_level = 'INFO'
# 预训练模型路径
load_from = None
# 模型断点
resume_from = None
# Runner的工作流
workflow = [('train', 1)]

  As you can see from the configuration file, the current pre-training model is downloaded from the official website of pytorch by default, and the path of the data set and the number of GPUs do not match my current one, and due to limited computer memory, I don’t want to save a checkpoint every epoch, so I Created a new configuration file my_retinanet_r50_fpn.pythat inherits the official configuration file and made some modifications:

_base_ = [
    'D:/Program Files/OpenSourceLib/mmdetection/configs/retinanet/retinanet_r50_fpn_1x_coco.py'
]

model = dict(
    backbone=dict(
        init_cfg=None)		# 不再直接从官网下载预训练模型,使用我自己下载好的预训练模型
)

data = dict(
    samples_per_gpu=2,		# batch_size=2
    workers_per_gpu=1,		# 每个GPU的线程数, 影响dataload的速度
    train=dict(
        type='CocoDataset',
        ann_file='E:/Dataset/COCO2014/annotations/instances_train2014.json',	# 修改数据集路径
        img_prefix='E:/Dataset/COCO2014/train2014'),
    val=dict(
        type='CocoDataset',
        ann_file='E:/Dataset/COCO2014/annotations/instances_val2014.json',
        img_prefix='E:/Dataset/COCO2014/val2014/'),
    test=dict(
        type='CocoDataset',
        ann_file='E:/Dataset/COCO2014/annotations/instances_val2014.json',
        img_prefix='E:/Dataset/COCO2014/val2014/')
)

evaluation = dict(interval=12, metric='bbox')	# 12个epoch进行一次评估

checkpoint_config = dict(interval=2)	# 2个epoch保存一次checkpoint

load_from = '../ckpts/resnet50-0676ba61.pth'	# 自己下载的预训练模型路径

3.3 Training Network

  After writing the configuration file, you can directly call ./tools/train.pythe specified configuration file for training. train.pyIt includes the analysis of model configuration, data set configuration, training configuration, Hook configuration, etc., and constructs training based on configuration information. User-defined operations can be configured through Hooks, and generally no need to modify files train.py.

python train.py my_retinanet_r50_fpn.py

  Successfully started training:

4. Summary

  This article uses the RetinaNet model that MMDetection has implemented to train on COCO as an example to demonstrate the model training process of MMDetection. In general, there are three steps:

  1. Prepare dataset
  2. Prepare the configuration file: The configuration file is composed of a series of dicts. The key values ​​in the dict typerepresent the registered categories, and the function can initialize the corresponding class buildby identifying the dict . typeThe configuration file generally inherits a common configuration file, and then adjusts it according to the needs on this basis.
  3. Start training: call the built-in MMDetection train.pyfor training.

  If you need to build your own model, you need to implement a class and then register. See the next blog for the mechanism of Registry and Hook.

Guess you like

Origin blog.csdn.net/qq_16137569/article/details/120929852