本文简要概括VOS任务中两个最重要的评价指标,即J&F(全称应该是Jaccard和F-Score)。其中J描述的是预测的mask和gt之间的IOU,F描述的是预测mask边界与gt边界之间的吻合程度。下面分别进行介绍:
Jaccard
J的计算其实非常简单,就是单纯的计算预测mask和gt mask之间的IOU,即一个比值的形式:分子是预测mask和gt这两张图foreground部分的交,而分母部分就是两者之间的并集。其代码实现如下:
def db_eval_iou(annotation,segmentation):
""" Compute region similarity as the Jaccard Index.
Arguments:
annotation (ndarray): binary annotation map.
segmentation (ndarray): binary segmentation map.
Return:
jaccard (float): region similarity
"""
annotation = annotation.astype(np.bool)
segmentation = segmentation.astype(np.bool)
if np.isclose(np.sum(annotation),0) and np.isclose(np.sum(segmentation),0):
return 1
else:
return np.sum((annotation & segmentation)) / \
np.sum((annotation | segmentation),dtype=np.float32)
F-score
F-score评估的是预测mask的边界是否与gt mask的边界对应。首先应提取预测mask和gt的边界元素坐标,将边界上的元素置为True,非边界的元素置为False。由于F-score的定义为:
F = 2 P R P + R F=\frac{2PR}{P+R} F=P+R2PR
P表示precision,即查准率;R表示recall,即查全率。其计算公式分别如下:
P = T P T P + F P P=\frac{TP}{TP+FP} P=TP+FPTP
R = T P T P + F N R=\frac{TP}{TP+FN} R=TP+FNTP
对于P的计算,分母应是预测mask的边界元素总数,分子则是在预测为边界的那些元素中真正属于gt的。换句话说,预测mask假设有100个元素为边界元素,但实际上可能只有70个存在于gt中,属于true positive,所以此时的查准率为70%。那么如何确定70这个数,也就是说如何确定有多少个预测为positive的元素属于true positive呢?这里采用了gt的边界(经过了一个binary_dilation的操作,感觉像是提升容错率),利用预测mask的边界和处理过后的gt边界做点乘,再通过sum即可计算true positive的个数。
同样地,对于R的计算,分母是gt mask的边界元素总数,分子表示多少个本质的正样本被预测出来。例如gt mask的边界有100个元素,但实际预测的mask中,只有70个真实的正样本被预测为positive,还有30个被误预测为negative,那么此时的recall为70%。具体计算是将预测mask的边界先进行binary_dilation,再用gt mask的边界和处理后的mask边界做点积,通过sum计算出true positive的个数。
上面的叙述还是比较晦涩的,一言来说,就是查准率P基于预测结果,判定这些预测为正的边界元素有多少真正的属于边界元素(参照gt);而查全率R是从标注的gt出发,我gt边界mask中正样本有N个,那么需要看看实际预测出来为正,且准确预测的元素有多少个(参照预测mask)。
这个衡量指标的算法如下:
def db_eval_boundary(foreground_mask,gt_mask,bound_th=0.008):
"""
Compute mean,recall and decay from per-frame evaluation.
Calculates precision/recall for boundaries between foreground_mask and
gt_mask using morphological operators to speed it up.
Arguments:
foreground_mask (ndarray): binary segmentation image.
gt_mask (ndarray): binary annotated image.
Returns:
F (float): boundaries F-measure
P (float): boundaries precision
R (float): boundaries recall
"""
assert np.atleast_3d(foreground_mask).shape[2] == 1
bound_pix = bound_th if bound_th >= 1 else \
np.ceil(bound_th*np.linalg.norm(foreground_mask.shape))
# Get the pixel boundaries of both masks
fg_boundary = seg2bmap(foreground_mask);
gt_boundary = seg2bmap(gt_mask);
from skimage.morphology import binary_dilation,disk
fg_dil = binary_dilation(fg_boundary,disk(bound_pix))
gt_dil = binary_dilation(gt_boundary,disk(bound_pix))
# Get the intersection
gt_match = gt_boundary * fg_dil
fg_match = fg_boundary * gt_dil
# Area of the intersection
n_fg = np.sum(fg_boundary)
n_gt = np.sum(gt_boundary)
#% Compute precision and recall
if n_fg == 0 and n_gt > 0:
precision = 1
recall = 0
elif n_fg > 0 and n_gt == 0:
precision = 0
recall = 1
elif n_fg == 0 and n_gt == 0:
precision = 1
recall = 1
else:
precision = np.sum(fg_match)/float(n_fg)
recall = np.sum(gt_match)/float(n_gt)
# Compute F measure
if precision + recall == 0:
F = 0
else:
F = 2*precision*recall/(precision+recall);
return F
def seg2bmap(seg,width=None,height=None):
"""
From a segmentation, compute a binary boundary map with 1 pixel wide
boundaries. The boundary pixels are offset by 1/2 pixel towards the
origin from the actual segment boundary.
Arguments:
seg : Segments labeled from 1..k.
width : Width of desired bmap <= seg.shape[1]
height : Height of desired bmap <= seg.shape[0]
Returns:
bmap (ndarray): Binary boundary map.
David Martin <[email protected]>
January 2003
"""
seg = seg.astype(np.bool)
seg[seg>0] = 1
assert np.atleast_3d(seg).shape[2] == 1
width = seg.shape[1] if width is None else width
height = seg.shape[0] if height is None else height
h,w = seg.shape[:2]
ar1 = float(width) / float(height)
ar2 = float(w) / float(h)
assert not (width>w | height>h | abs(ar1-ar2)>0.01),\
'Can''t convert %dx%d seg to %dx%d bmap.'%(w,h,width,height)
e = np.zeros_like(seg)
s = np.zeros_like(seg)
se = np.zeros_like(seg)
e[:,:-1] = seg[:,1:]
s[:-1,:] = seg[1:,:]
se[:-1,:-1] = seg[1:,1:]
b = seg^e | seg^s | seg^se
b[-1,:] = seg[-1,:]^e[-1,:]
b[:,-1] = seg[:,-1]^s[:,-1]
b[-1,-1] = 0
if w == width and h == height:
bmap = b
else:
bmap = np.zeros((height,width))
for x in range(w):
for y in range(h):
if b[y,x]:
j = 1+floor((y-1)+height / h)
i = 1+floor((x-1)+width / h)
bmap[j,i] = 1;
return bmap
代码具体细节有某些部分也有些疑惑,不过大致思路就是如上所述。