[Pedestrian detection] miss rate versus false positives per image (FPPI) Past and present (actual combat-part 2)

Before we introduced:

Today we will use our own data to draw the FPPI chart
(the first to the sixth are in the previous article, so this article starts directly from the seventh)


Seven, prepare gt and dt

In the original author's code, the format of annotations is vbb, which is really unfriendly. . . Fortunately, through the source code solution, we have found a breakthrough!
Insert picture description here
As long as we know dtsand gtsformats, and then load(matfile)in the form of alternative loadDtand loadGtcan

1. Analysisdts

dtsIt is a 1*N cell, and N represents the type of algorithm. For example, the figure below shows that there are 6 algorithms.

Insert picture description here
Each cell contains a 1*M cell, and M represents the number of pictures. For example, the picture above shows 4024 pictures.

The following figure shows dtsa second cellular content (i.e., the second detection result algorithm)
Insert picture description here
data dimensions of each column are not the same, because the outcome of each picture is detected it is not the same. For example, 3x5 double means 3 pedestrians are detected in the picture, 4x5 double means 4 pedestrians are detected in the picture

Let’s just click and look at it. Each row represents a pedestrian label, and each column represents [x y w h score]: the x coordinate of the upper left corner, the y coordinate of the upper left corner, the width of the box, the height of the box, and the confidence level.

Due to different algorithms, the score range of different algorithms may be different, some will be greater than 1, or even negative, but this will not have an impact

Insert picture description here
note! This is dtsequivalent to the output during visualization!
After non-maximum suppression processing
, the threshold of confidence in visualization is

2. Analysisgts

Since we only use one data set, it gtsis a 1*1 cell. Each cell contains a 1*M cell, and M represents the number of pictures. For example, the following figure shows that there are 4024 pictures. This dtsis the same as the routine, but it should be noted that the order of the 4024 pictures should be dtsmatched.

Insert picture description here
Click to open the cell and continue to look. It shows the ground truth of each picture. Some pictures are not marked with pedestrians, so they will be empty.
Insert picture description here

Just click on a picture and continue to look at it. The structure is dtsvery similar. Each row is a detection box, and each column represents [x y w h score]: the x coordinate of the upper left corner, the y coordinate of the upper left corner, the width of the box, the height of the box, and the confidence level.
Insert picture description here

[We only need to replace gts and dts with our own results, to be written later]

Guess you like

Origin blog.csdn.net/weixin_38705903/article/details/109696278