Detecting Faces Using Region-based Fully Convolutional Networks papers read notes

First, think of ideas as well as previous work done

Challenges face detection:

The actual picture of the complex diversity ( variants ): face obscured, face different scales, facial expressions, lighting conditions of the picture, the picture put on the human posture.

 

Thinking:

When the paper presented, based on the region has made in the method of face detection in a large success, but directly to a specific region functioning policy to FCN ( Fully Convolutional Network ), such as ResNets , the classification will lead to reduced accuracy . After the proposed R-FCN network can locate FCN problems. R-FCN in ConvNet can share calculation for the entire picture, the efficiency of training and testing has improved.

Than R & lt-CNN , R & lt FCN- binding FCN and region-based module using less region-wise layer to balance the detection and classification of the study.

Q : directly to the region applied FCN in, why would reduce the accuracy of the classification? R-FCN can locate, so how positioning?

 

We need to look at the relevant papers, after supplement

 

 

Second, the contribution of this paper

1, by integrating a number of novel and effective technique will face special attributes into thinking range;

2, the use of the novel ( Novel ) positive-sensitive Average Pooling . The pooling method to redefine the score maps weight distribution of the weights, thus reducing the portion of each face of the non-uniformed affect the distribution, i.e. the use of a weighted average, rather than directly average method.

Third, the network structure

 

1, based on the R-FCN structure

Using the 101-layer of ResNet as a backbone, wherein ResNet role of feature extraction, which can extract the representative image feature height ( Highly Representative Image Feature ), comprising a larger receptive field; while ResNet final stage ( Last Stage ) using hollow convolution ( Atrous / Dilated CONV ), feature map scale to ensure a larger receptive field will not be lost context information, in a small detection face ( Tiny face ), the context information may be from benefit.

 

2、Position-sensitive average pooling

Using position-sensitive average pooling replace the original , Ltd. Free Join Average Pooling , to do the final the Feature Voting (all results of the vote on the final classification, a simple majority). Compared to the global average pooling absolute average, position-sentive Average Pooling uses a weighted average of a feature map scale ( n-n-* ) training parameter ( n-n-* ). This is because given the level of attention to each position of the face in the face detection may be different, for example, focus on the eyes may be higher than the degree of concern for the mouth, it can not be averaged directly, but taking a weighted average, so better able to identify the human face.

 

Wherein W J refers to the first jth parameter, N is selected RoI of Scale .

 

3、Multi-scale Training and Testing

When training: Shortest Side as 1024 or 1200pixels . It makes the model proposed in this paper can remain robust in the detection of different scales of the human face, especially in the detection of tiny human face. While negative samples used in OHEM (Example Mining Hard the On-Line) , provided positive the Samples : negative samples. 1 = : . 3 ; OHEM bootstrapping time (bootstrpping) effective techniques.

In testing, the establishment of Image Pyramid , each scale is individually tested and the results from different scales eventually integrate the resulting image.

 

Fourth, analysis

1, R-FCN and Faster R-CNN comparison between

R-FCN use deeper CNN , and is not used in the head entire image sharing calculation speed;

Using position sensitive Pooling RoI , each RoI encoded into the location information, through a set of feature maps pooled to output score maps the exact positioning of;

Did not fully connected layers into the ResNet structures, R-FCN training to get the feature maps information richer, and more convenient e-learning class score and the bounding box positioning.

 

2, the experiment provided

 

Respectively WIDER Face dataset with FDDB dataset for the experiment. WIDER FACE A total of 32203 pictures, a total of 393,703 people marked face, which set up the training set, validation set and test set is divided into 40 , 10 , 50 . The difficulty of identifying validation and test sets are divided into three subsets (based on the Easy , Medium , Hard ). FDDB Party 2845 pictures, a total of 5171 marked face.

 

Guess you like

Origin www.cnblogs.com/fanzhongjie/p/11440363.html