使用class weight和sample weight处理不平衡问题

class weight:对训练集里的每个类别加一个权重。如果该类别的样本数多,那么它的权重就低,反之则权重就高.

sample weight:对每个样本加权重,思路和类别权重类似,即样本数多的类别样本权重低,反之样本权重高 [ 1 ] ^{[1]}

PS:sklearn中绝大多数分类算法都有class weight和 sample weight可以使用。

Pytorch Tensorflow2 & Keras
class weight 多分类:torch.nn.CrossEntropyLoss(weight=…)

二分类/多标签:torch.nn.BCEWithLogitsLoss(pos_weight=…)
二分类:tf.nn.weighted_cross_entropy_with_logits(pos_weight=…)二分类或多分类 [ 2 ] ^{[2]} :model.fit(class_weight=…)
sample weight 多标签:torch.nn.BCEWithLogitsLoss(weight=…) model.fit(sample_weight=…)

使用class weight时注意 [ 2 ] ^{[2]}

1、使用class_weight会改变loss的范围,从而有可能影响到训练的稳定性. 当Optimizer的step size与梯度的大小有关时,将会出问题. 而类似Adam等优化器则不受影响. 另外,使用了class_weight后的模型的loss的大小不能和不使用class_weight的模型直接对比.

Note: Using class_weights changes the range of the loss. This may affect the stability of the training depending on the optimizer. Optimizers whose step size is dependent on the magnitude of the gradient, like optimizers.SGD, may fail. The optimizer used here, optimizers.Adam, is unaffected by the scaling change. Also note that because of the weighting, the total losses are not comparable between the two models.

2、设置class weight有一定讲究,参考资料[2]在不平衡的二分类问题中,为了让loss保持与之前的大小相接近,使用了下述代码来计算class weight:

在这里插入图片描述

另外,也可使用参考资料[4]中所提到的sklearn.utils或自定义的计算方法来得到class_weight.

References:

[1] Sample Weight & Class Weight

[2] Classification on imbalanced data

[3] Tensorflow Doc: Model.fit

[4] How to set class weights for imbalanced classes in Keras

发布了67 篇原创文章 · 获赞 27 · 访问量 7万+

猜你喜欢

转载自blog.csdn.net/xpy870663266/article/details/104600054
今日推荐