Pedestrian re-recognition-attitude detection

Pedestrian re-recognition-attitude detection

Preface

From the extraction of image features for classification, the methods of pedestrian re-identification can be divided into methods based on global features and local features . Global features are relatively simple, which means that the network extracts a feature from the entire image. This feature does not consider some local information. Normal convolutional networks extract global features.
However, as the pedestrian data set becomes more and more complex, only the use of global features cannot meet the performance requirements, so extracting more complex local features has become a research hotspot.
Local features refer to manually or automatically allowing the network to focus on the key local Regions, and then extract the local features of these regions. Commonly used ideas for extracting local features mainly include image dicing , using skeleton key point positioning, and pedestrian foreground segmentation, etc.

Global characteristics

A feature is extracted for the global information of each pedestrian picture, and this global feature does not have any spatial information .
Insert picture description hereThrough a simple convolutional neural network, a feature about the picture is obtained. This feature is called a global feature, but this method has some defects, such as noise regions that can cause great interference to the global feature, and posture is not aligned It will also make the global feature unmatched.

Detection method based on local features

Local feature refers to the feature extraction of a certain area in the image, and finally multiple local features are merged as the final feature.
Insert picture description here

Local feature-attitude detection

It is a common method to use the key points of human pose to align local features. Some current papers mostly use some prior knowledge (preprocessed human pose and skeleton key point model) to align pedestrians, and then detect and judge local features.
Insert picture description hereUsually a pedestrian will define 14 pose points (pose/keypoint), and two adjacent pose points are connected to form a skeleton (skeleton).
Commonly used pose point estimation models include: Hourglass, OpenPose, CPM, AlphaPose.

Related algorithms

1.PIE

Pose Invariant Embedding for Deep Person Re-identification
The article mentioned above is an early article on pose detection. The main work is roughly as follows:
Insert picture description hereCPM is used for key point collection. CPM is a sequential convolutional architecture that can detect 14 body joints, namely head, neck, left and right shoulders, left and right elbows, left and right elbows, right wrists, left and right hips, left and right knees, and also There are left and right ankles, as shown in the first column to the second column in the picture above.
Insert picture description hereDivide the picture into several parts, and perform affine transformation and alignment to obtain a rectangular area, which can solve the problem of different sizes and poses of the same part in different pictures, as shown in the third column and fourth column in the above figure:
Insert picture description herefusion of the original image and the affine image Features, and use ID loss to train the network:
Insert picture description hereas shown in the figure above, the original image and poseBox first pass through two convolutional neural networks with no weights to obtain their respective features, and then combine with a 14-dimensional pose confidence score to enter the PIE network , Fuse the corresponding features, and the last three losses corresponding to the obtained from top to bottom are the global loss, the fusion loss, and the local loss.

2.Spindle Net

Spindle Net: Person Re-identification with
Human Body Region GuidedFeature Decomposition and Fusion This is a comparison using classical gesture recognition points pedestrian weight paper, as shown below, is first extracted by the key points of the network backbone 14 extracts the key body brother point. These key points extract 7 ROIs of human body structure, corresponding to the head, upper body, lower body, left arm, right arm, left leg, and right leg.
Insert picture description hereThen the 7 ROI regions and the original image are entered into the same CNN network to extract features. The original image passes through the complete CNN network to obtain a global feature, and the three large areas pass through the FEN-C2 and FEN-C3 sub-networks to obtain three local features. The four limb regions pass through the FEN-C3 sub-network to obtain four local features. Then these 8 features are connected at different scales according to the diagram, and finally a pedestrian re-recognition feature that combines global features and local features of multiple scales is obtained.

Insert picture description here

3.PDC

The Pose-driven Deep Convolutional Model for Person Re-identification is
different from the chestnut above. When the author of PDC extracts key points for pedestrians, although he also extracts 14 key points, he divides the pedestrians into 6 parts Insert picture description hereand adopts an improved one. The PTN network learns the parameters of the affine transformation and automatically places them in certain positions in the figure. Here, gaps between different parts are allowed.
Insert picture description hereAfter the partial image is obtained, the original image and the posture image can be performed separately Feature extraction, shallow sharing of the network, deep non-sharing, training the network, and finally get the effect similar to the above, global loss, local loss and fusion loss.

Insert picture description here

4.GLAD

GLAD: Global-Local-Alignment Descriptor for Pedestrian Retrieval
GLAD divides the human body into three parts: head, upper body, and lower body, Insert picture description hereand then calculates loss through a network that can share weights, and finally stitches the obtained features to get:Insert picture description here

Insert picture description here

5.PABP

Part-Aligned Bilinear Representations for Person Re-identification
discusses the problem from the pixel level, uses the ReID network to extract the feature map A, uses openpose to extract the feature map P, the vector of each corresponding pixel position of A and P is outer producted and vectorized.Insert picture description here

to sum up

  • Use a pose estimation model to get the pedestrian's (14) key pose points
  • Get the part area with semantic information according to the posture point
  • Extract local features for each part area
  • Combining local features and global features can often get better results

Guess you like

Origin blog.csdn.net/qq_37747189/article/details/109670946