Challenges of attitude detection:
- The number of people in each picture is unknown;
- The interaction between people is complicated (contact, occlusion, etc.), making it difficult to detect some key points;
- The more people in the image, the greater the time cost, making real time application difficult;
Detection Indicator
- PCK , Percentage of Correct Keypoints, the proportion of key points that are correctly estimated. Calculate the ratio of the normalized distance between the detection key point and the corresponding ground-truth less than the set threshold. The FLIC data set uses the torso size as a normalized reference; the MPII data set uses the head length as a normalized reference.
- PDJ , Percentage of Detected Joints, the ratio of detected key points.
- OKS , Object Keypoint Similarity, calculates the similarity between ground-truth and key points of the human body, inspired by IoU.
- OKS matrix.
- AP , Average Precision, among all OKS, count the proportion of joint points that are larger than the threshold t.
- mAP , mean Average Precision, given a different threshold t, the mean value of AP.
OKS p = ∑ i exp (− dpi 2 2 S p 2 σ i 2) δ (vpi = 1) ∑ i δ (vpi = 1) OKS_p=\frac{\sum_i\exp(\frac{-d_(p_i }^2}{2S_p^2\sigma_i^2})\delta(v_{p_i}=1)}{\sum_i\delta(v_{p_i}=1)}OKSp=∑iδ ( vpi=1)∑iexp(2Sp2σi2−dpi2) δ ( vpi=1)
Where ppp represents the id of the person in the ground-truth,iii represents the id of keypoint,dpi d_{p_i}dpiRepresents the Euclidean distance between ground-truth and prediction key point, S p S_pSpRepresents the scale factor of the current person, σ i \sigma_iσiRepresents the normalization factor of the i-th key point (this factor is obtained from the standard deviation of all ground-truth calculations of the dataset, reflecting the standard deviation of the current bone annotation, σ \sigmaThe larger the σ, the more difficult it is to label),vpi v_{p_i}vpiIndicates whether the i-th joint point of the p-th person is visible, δ \deltaThe delta function is a function used to select the visible points for calculation, and is a Boolean function.
Method classification
- The top-down method first detects the person (target detection), and after the detection frame is obtained, the single person (target) key point detection (single person pose estimation) is performed.
- Bottom-up method , first detect all the nodes of everyone, and then group and parallel the joints to combine adults
Generally, the top-down method has higher accuracy (a two-stage structure of target detection first, and then joint point detection), and the bottom-up method is faster.
- In 2016, CPM, Hourglass, was the SOTA algorithm used for detection and estimation at the time;
- In 2016, the method used by OpenPose was the champion of COCO key point detection;
- In 2017, CPN was the champion of COCO key point detection;
- In 2018, MSPN was the champion of COCO key point detection;
- In 2019, MSPA's xxx proposed HRNet, which verified the importance of spatial resolution;