Prerequisite knowledge of attitude detection

Challenges of attitude detection:

  1. The number of people in each picture is unknown;
  2. The interaction between people is complicated (contact, occlusion, etc.), making it difficult to detect some key points;
  3. The more people in the image, the greater the time cost, making real time application difficult;

Detection Indicator

  • PCK , Percentage of Correct Keypoints, the proportion of key points that are correctly estimated. Calculate the ratio of the normalized distance between the detection key point and the corresponding ground-truth less than the set threshold. The FLIC data set uses the torso size as a normalized reference; the MPII data set uses the head length as a normalized reference.
  • PDJ , Percentage of Detected Joints, the ratio of detected key points.
  • OKS , Object Keypoint Similarity, calculates the similarity between ground-truth and key points of the human body, inspired by IoU.
  • OKS matrix.
  • AP , Average Precision, among all OKS, count the proportion of joint points that are larger than the threshold t.
  • mAP , mean Average Precision, given a different threshold t, the mean value of AP.
    OKS p = ∑ i exp ⁡ (− dpi 2 2 S p 2 σ i 2) δ (vpi = 1) ∑ i δ (vpi = 1) OKS_p=\frac{\sum_i\exp(\frac{-d_(p_i }^2}{2S_p^2\sigma_i^2})\delta(v_{p_i}=1)}{\sum_i\delta(v_{p_i}=1)}OKSp=iδ ( vpi=1)iexp(2Sp2σi2dpi2) δ ( vpi=1)
    Where ppp represents the id of the person in the ground-truth,iii represents the id of keypoint,dpi d_{p_i}dpiRepresents the Euclidean distance between ground-truth and prediction key point, S p S_pSpRepresents the scale factor of the current person, σ i \sigma_iσiRepresents the normalization factor of the i-th key point (this factor is obtained from the standard deviation of all ground-truth calculations of the dataset, reflecting the standard deviation of the current bone annotation, σ \sigmaThe larger the σ, the more difficult it is to label),vpi v_{p_i}vpiIndicates whether the i-th joint point of the p-th person is visible, δ \deltaThe delta function is a function used to select the visible points for calculation, and is a Boolean function.

Method classification

  • The top-down method first detects the person (target detection), and after the detection frame is obtained, the single person (target) key point detection (single person pose estimation) is performed.
  • Bottom-up method , first detect all the nodes of everyone, and then group and parallel the joints to combine adults

  Generally, the top-down method has higher accuracy (a two-stage structure of target detection first, and then joint point detection), and the bottom-up method is faster.

  • In 2016, CPM, Hourglass, was the SOTA algorithm used for detection and estimation at the time;
  • In 2016, the method used by OpenPose was the champion of COCO key point detection;
  • In 2017, CPN was the champion of COCO key point detection;
  • In 2018, MSPN was the champion of COCO key point detection;
  • In 2019, MSPA's xxx proposed HRNet, which verified the importance of spatial resolution;

Guess you like

Origin blog.csdn.net/qq_19784349/article/details/106369418