YOLOv11改进 | 损失函数改进篇 | QualityFocalLoss质量焦点损失（含代码 + 详细修改教程）

一、本文介绍

本文给大家带来的改进机制是QualityFocalLoss，其是一种CLS分类损失函数，它的主要创新是将目标的定位质量（如边界框与真实对象的重叠度量，例如IoU得分）直接融合到分类损失中，形成一个联合表示。这种方法能够解决传统目标检测中分类与定位任务之间存在的不一致性问题。QFL通过为每个类别的得分赋予根据定位质量调整的权重，使得检测模型在训练过程中能够更加关注那些难以定位或分类的样本。

在开始之前给大家推荐一下我的专栏，本专栏每周更新3-10篇最新前沿机制 | 包括二次创新全网无重复，以及融合改进(大家拿到之后添加另外一个改进机制在你的数据集上实现涨点即可撰写论文)，还有各种前沿顶会改进机制 |，更有包含我所有附赠的文件（文件内集成我所有的改进机制全部注册完毕可以直接运行）和交流群和视频讲解提供给大家。

欢迎大家订阅我的专栏一起学习YOLO！

专栏回顾：YOLOv11改进系列专栏——本专栏持续复习各种顶会内容——科研必备

一、本文介绍

二、Quality Focal Loss原理

2.1 Quality Focal Loss的基本原理

3.1 修改一

3.2 修改二

3.3 修改三

四、本文总结

二、Quality Focal Loss原理

论文地址： 官方论文地址

代码地址： 官方代码地址

2.1 Quality Focal Loss的基本原理

Quality Focal Loss (QFL) 是一种用于目标检测的改进损失函数。它的主要创新是将目标的定位质量（如边界框与真实对象的重叠度量，例如IoU得分）直接融合到分类损失中，形成一个联合表示。这种方法能够解决传统目标检测中分类与定位任务之间存在的不一致性问题。QFL通过为每个类别的得分赋予根据定位质量调整的权重，使得检测模型在训练过程中能够更加关注那些难以定位或分类的样本。

Quality Focal Loss（QFL）的基本原理可以分为以下几个要点：

1. 联合表示法: QFL将定位质量（如IoU分数）与分类得分融合为一个联合表示，这种表示在训练和推理过程中保持一致性，有助于解决在训练和测试阶段对质量估计和分类得分使用不一致的问题。

2. 连续标签支持: 传统的Focal Loss仅支持离散的{0, 1}标签，而QFL扩展了这一概念，支持连续的标签（如IoU分数），从0到1的浮点数，更好地反映了实际数据中的情况。

3. 动态调整难度: QFL通过动态调整损失函数，使得模型在训练过程中更多地关注难以分类或定位的样本，从而提高模型的整体性能。

2.2 联合表示法

联合表示法是Quality Focal Loss中的核心概念，它将分类得分和定位质量（例如IoU得分）整合到单一的预测向量中。这种表示方法解决了传统目标检测方法中训练和推理阶段质量估计与分类评分分离使用的不一致性问题。具体来说，它允许模型在预测分类的同时，估计每个检测框的定位质量，从而在非最大抑制（NMS）处理中提供更准确的排序得分，改善检测性能。

下图展示了传统方法（Existing Work）和我们的方法之间在分类和定位质量估计方面的不同表示形式的对比：

在图(a)中，即现有工作，训练和测试阶段分别独立处理分类得分、边界框回归和IoU/centerness得分，这导致了训练和推理之间的不一致性。

而在图(b)中，我们的方法则将分类得分和IoU得分结合为一个联合表示，即在训练和测试时都使用的分类与IoU联合得分。这种联合表示提高了训练和推理之间的一致性，与Quality Focal Loss的基本原理中提到的“联合表示法”紧密相关。在Quality Focal Loss中，通过这种方式，模型能够在训练过程中考虑到每个样本的定位质量，使得损失函数能够更加关注那些定位或分类困难的样本。

2.3 连续标签支持

连续标签支持是指在质量焦点损失（Quality Focal Loss, QFL）中，分类的输出标签不再是传统的0或1（如在one-hot编码中），而是可以取任意在0到1之间的连续值。这些连续值代表了目标定位的质量，通常是指与真实边界框的交并比（IoU）。通过这种方式，QFL可以直接在损失函数中整合定位质量，使得损失函数能够对定位不准确的样本施加更大的权重，从而激励模型学习更准确地预测边界框。

下面这张图比较了传统目标检测方法和提出的广义焦点损失（GFL）方法之间的差异：

在传统方法（Existing Work）中，分类分支使用one-hot标签进行正类和负类的区分，而回归分支则采用Dirac delta分布进行边界框的预测。相比之下，GFL方法引入了质量焦点损失（QFL）和分布焦点损失（DFL）。QFL通过软one-hot标签（IoU标签）进行学习，这些标签反映了边界框的定位质量。同时，DFL使用一般分布 $P_{x}$ 来模拟边界框位置的概率分布。

2.4 动态调整难度

动态调整难度是质量焦点损失（Quality Focal Loss, QFL）的一个特点，它允许模型在训练过程中更多地关注那些难以分类或定位的样本。这是通过调整损失函数中的一个参数来实现的，该参数会增加对模型预测不确定性较高的样本的损失值，使得模型更加集中于这些难以预测的样本上。这种方法旨在提高模型对困难样本的敏感性，帮助模型更加精确地进行分类和定位，尤其是在面对复杂或模糊的检测场景时。

下面这张图展示了Quality Focal Loss (QFL)在不同β参数下的损失曲线，以及不同分布对于相同积分目标的表示和实际边界框回归目标的分布直方图：

图(a)展示的QFL损失曲线与基本原理中的“动态调整难度”相关，因为它展示了如何通过调整β参数来调节模型对于难以预测样本的关注度。

图(b)展示了不同的概率分布如何针对相同的积分目标（即回归目标）进行调整，这与“表示任意分布的边界框位置”的原理有关。

图(c)则是实际数据集中回归目标的分布，这有助于我们理解和验证QFL在实际应用中的效果。

三、核心代码

使用方式看章节四

from .tal import bbox2dist
import torch.nn.functional as F
import math


class QualityfocalLoss(nn.Module):
    def __init__(self, beta=2.0):
        super().__init__()
        self.beta = beta

    def forward(self, pred_score, gt_score, gt_target_pos_mask):
        # negatives are supervised by 0 quality score
        pred_sigmoid = pred_score.sigmoid()
        scale_factor = pred_sigmoid
        zerolabel = scale_factor.new_zeros(pred_score.shape)
        with torch.cuda.amp.autocast(enabled=False):
            loss = F.binary_cross_entropy_with_logits(pred_score, zerolabel, reduction='none') * scale_factor.pow(
                self.beta)

        scale_factor = gt_score[gt_target_pos_mask] - pred_sigmoid[gt_target_pos_mask]
        with torch.cuda.amp.autocast(enabled=False):
            loss[gt_target_pos_mask] = F.binary_cross_entropy_with_logits(pred_score[gt_target_pos_mask],
                                                                          gt_score[gt_target_pos_mask],
                                                                          reduction='none') * scale_factor.abs().pow(
                self.beta)
        return loss


class SlideLoss(nn.Module):
    def __init__(self, loss_fcn):
        super(SlideLoss, self).__init__()
        self.loss_fcn = loss_fcn
        self.reduction = loss_fcn.reduction
        self.loss_fcn.reduction = 'none'  # required to apply SL to each element

    def forward(self, pred, true, auto_iou=0.5):
        loss = self.loss_fcn(pred, true)
        if auto_iou < 0.2:
            auto_iou = 0.2
        b1 = true <= auto_iou - 0.1
        a1 = 1.0
        b2 = (true > (auto_iou - 0.1)) & (true < auto_iou)
        a2 = math.exp(1.0 - auto_iou)
        b3 = true >= auto_iou
        a3 = torch.exp(-(true - 1.0))
        modulating_weight = a1 * b1 + a2 * b2 + a3 * b3
        loss *= modulating_weight
        if self.reduction == 'mean':
            return loss.mean()
        elif self.reduction == 'sum':
            return loss.sum()
        else:  # 'none'
            return loss


class Focal_Loss(nn.Module):
    # Wraps focal loss around existing loss_fcn(), i.e. criteria = FocalLoss(nn.BCEWithLogitsLoss(), gamma=1.5)
    def __init__(self, loss_fcn, gamma=1.5, alpha=0.25):
        super().__init__()
        self.loss_fcn = loss_fcn  # must be nn.BCEWithLogitsLoss()
        self.gamma = gamma
        self.alpha = alpha
        self.reduction = loss_fcn.reduction
        self.loss_fcn.reduction = 'none'  # required to apply FL to each element

    def forward(self, pred, true):
        loss = self.loss_fcn(pred, true)
        # p_t = torch.exp(-loss)
        # loss *= self.alpha * (1.000001 - p_t) ** self.gamma  # non-zero power for gradient stability

        # TF implementation https://github.com/tensorflow/addons/blob/v0.7.1/tensorflow_addons/losses/focal_loss.py
        pred_prob = torch.sigmoid(pred)  # prob from logits
        p_t = true * pred_prob + (1 - true) * (1 - pred_prob)
        alpha_factor = true * self.alpha + (1 - true) * (1 - self.alpha)
        modulating_factor = (1.0 - p_t) ** self.gamma
        loss *= alpha_factor * modulating_factor

        if self.reduction == 'mean':
            return loss.mean()
        elif self.reduction == 'sum':
            return loss.sum()
        else:  # 'none'
            return loss

def reduce_loss(loss, reduction):
    """Reduce loss as specified.

    Args:
        loss (Tensor): Elementwise loss tensor.
        reduction (str): Options are "none", "mean" and "sum".

    Return:
        Tensor: Reduced loss tensor.
    """
    reduction_enum = F._Reduction.get_enum(reduction)
    # none: 0, elementwise_mean:1, sum: 2
    if reduction_enum == 0:
        return loss
    elif reduction_enum == 1:
        return loss.mean()
    elif reduction_enum == 2:
        return loss.sum()


def weight_reduce_loss(loss, weight=None, reduction='mean', avg_factor=None):
    """Apply element-wise weight and reduce loss.

    Args:
        loss (Tensor): Element-wise loss.
        weight (Tensor): Element-wise weights.
        reduction (str): Same as built-in losses of PyTorch.
        avg_factor (float): Avarage factor when computing the mean of losses.

    Returns:
        Tensor: Processed loss values.
    """
    # if weight is specified, apply element-wise weight
    if weight is not None:
        loss = loss * weight

    # if avg_factor is not specified, just reduce the loss
    if avg_factor is None:
        loss = reduce_loss(loss, reduction)
    else:
        # if reduction is mean, then average the loss by avg_factor
        if reduction == 'mean':
            loss = loss.sum() / avg_factor
        # if reduction is 'none', then do nothing, otherwise raise an error
        elif reduction != 'none':
            raise ValueError('avg_factor can not be used with reduction="sum"')
    return loss



def varifocal_loss(pred,
                   target,
                   weight=None,
                   alpha=0.75,
                   gamma=2.0,
                   iou_weighted=True,
                   reduction='mean',
                   avg_factor=None):
    """`Varifocal Loss <https://arxiv.org/abs/2008.13367>`_

    Args:
        pred (torch.Tensor): The prediction with shape (N, C), C is the
            number of classes
        target (torch.Tensor): The learning target of the iou-aware
            classification score with shape (N, C), C is the number of classes.
        weight (torch.Tensor, optional): The weight of loss for each
            prediction. Defaults to None.
        alpha (float, optional): A balance factor for the negative part of
            Varifocal Loss, which is different from the alpha of Focal Loss.
            Defaults to 0.75.
        gamma (float, optional): The gamma for calculating the modulating
            factor. Defaults to 2.0.
        iou_weighted (bool, optional): Whether to weight the loss of the
            positive example with the iou target. Defaults to True.
        reduction (str, optional): The method used to reduce the loss into
            a scalar. Defaults to 'mean'. Options are "none", "mean" and
            "sum".
        avg_factor (int, optional): Average factor that is used to average
            the loss. Defaults to None.
    """
    # pred and target should be of the same size
    assert pred.size() == target.size()
    pred_sigmoid = pred.sigmoid()
    target = target.type_as(pred)
    if iou_weighted:
        focal_weight = target * (target > 0.0).float() + \
            alpha * (pred_sigmoid - target).abs().pow(gamma) * \
            (target <= 0.0).float()
    else:
        focal_weight = (target > 0.0).float() + \
            alpha * (pred_sigmoid - target).abs().pow(gamma) * \
            (target <= 0.0).float()
    loss = F.binary_cross_entropy_with_logits(
        pred, target, reduction='none') * focal_weight
    loss = weight_reduce_loss(loss, weight, reduction, avg_factor)
    return loss

class Vari_focalLoss(nn.Module):

    def __init__(self,
                 use_sigmoid=True,
                 alpha=0.75,
                 gamma=2.0,
                 iou_weighted=True,
                 reduction='sum',
                 loss_weight=1.0):
        """`Varifocal Loss <https://arxiv.org/abs/2008.13367>`_

        Args:
            use_sigmoid (bool, optional): Whether the prediction is
                used for sigmoid or softmax. Defaults to True.
            alpha (float, optional): A balance factor for the negative part of
                Varifocal Loss, which is different from the alpha of Focal
                Loss. Defaults to 0.75.
            gamma (float, optional): The gamma for calculating the modulating
                factor. Defaults to 2.0.
            iou_weighted (bool, optional): Whether to weight the loss of the
                positive examples with the iou target. Defaults to True.
            reduction (str, optional): The method used to reduce the loss into
                a scalar. Defaults to 'mean'. Options are "none", "mean" and
                "sum".
            loss_weight (float, optional): Weight of loss. Defaults to 1.0.
        """
        super(Vari_focalLoss, self).__init__()
        assert use_sigmoid is True, \
            'Only sigmoid varifocal loss supported now.'
        assert alpha >= 0.0
        self.use_sigmoid = use_sigmoid
        self.alpha = alpha
        self.gamma = gamma
        self.iou_weighted = iou_weighted
        self.reduction = reduction
        self.loss_weight = loss_weight

    def forward(self,
                pred,
                target,
                weight=None,
                avg_factor=None,
                reduction_override=None):
        """Forward function.

        Args:
            pred (torch.Tensor): The prediction.
            target (torch.Tensor): The learning target of the prediction.
            weight (torch.Tensor, optional): The weight of loss for each
                prediction. Defaults to None.
            avg_factor (int, optional): Average factor that is used to average
                the loss. Defaults to None.
            reduction_override (str, optional): The reduction method used to
                override the original reduction method of the loss.
                Options are "none", "mean" and "sum".

        Returns:
            torch.Tensor: The calculated loss
        """
        assert reduction_override in (None, 'none', 'mean', 'sum')
        reduction = (
            reduction_override if reduction_override else self.reduction)
        if self.use_sigmoid:
            loss_cls = self.loss_weight * varifocal_loss(
                pred,
                target,
                weight,
                alpha=self.alpha,
                gamma=self.gamma,
                iou_weighted=self.iou_weighted,
                reduction=reduction,
                avg_factor=avg_factor)
        else:
            raise NotImplementedError
        return loss_cls

三、使用方式

3.1 修改一

找到'ultralytics/utils/loss.py'文件，将上面的代码复制在文件的开头（注意是在模块导入的后方）

3.2 修改二

按照图示进行修改即可。

        "下面的代码注释掉就是正常的损失函数，如果不注释使用的就是使用对应的损失失函数"
        # self.bce = Focal_Loss(nn.BCEWithLogitsLoss(reduction='none')) # Focal
        # self.bce = Vari_focalLoss() # VFLoss
        # self.bce = SlideLoss(nn.BCEWithLogitsLoss(reduction='none')) # SlideLoss
        # self.bce = QualityfocalLoss()  # 目前仅持者目标检测需要注意！

3.3 修改三

按照图片进行修改，代码在图片的下方，完全替换call 内的代码即可。

替换代码再次处。

    def __call__(self, preds, batch):
        """Calculate the sum of the loss for box, cls and dfl multiplied by batch size."""
        loss = torch.zeros(3, device=self.device)  # box, cls, dfl
        feats = preds[1] if isinstance(preds, tuple) else preds
        pred_distri, pred_scores = torch.cat([xi.view(feats[0].shape[0], self.no, -1) for xi in feats], 2).split(
            (self.reg_max * 4, self.nc), 1
        )

        pred_scores = pred_scores.permute(0, 2, 1).contiguous()
        pred_distri = pred_distri.permute(0, 2, 1).contiguous()

        dtype = pred_scores.dtype
        batch_size = pred_scores.shape[0]
        imgsz = torch.tensor(feats[0].shape[2:], device=self.device, dtype=dtype) * self.stride[0]  # image size (h,w)
        anchor_points, stride_tensor = make_anchors(feats, self.stride, 0.5)

        # Targets
        targets = torch.cat((batch["batch_idx"].view(-1, 1), batch["cls"].view(-1, 1), batch["bboxes"]), 1)
        targets = self.preprocess(targets.to(self.device), batch_size, scale_tensor=imgsz[[1, 0, 1, 0]])
        gt_labels, gt_bboxes = targets.split((1, 4), 2)  # cls, xyxy
        mask_gt = gt_bboxes.sum(2, keepdim=True).gt_(0)

        # pboxes
        pred_bboxes = self.bbox_decode(anchor_points, pred_distri)  # xyxy, (b, h*w, 4)

        target_labels, target_bboxes, target_scores, fg_mask, _ = self.assigner(
            pred_scores.detach().sigmoid(), (pred_bboxes.detach() * stride_tensor).type(gt_bboxes.dtype),
            anchor_points * stride_tensor, gt_labels, gt_bboxes, mask_gt)

        target_scores_sum = max(target_scores.sum(), 1)

        # Cls loss
        # loss[1] = self.varifocal_loss(pred_scores, target_scores, target_labels) / target_scores_sum  # VFL way
        if isinstance(self.bce, (nn.BCEWithLogitsLoss, Vari_focalLoss, Focal_Loss)):
            loss[1] = self.bce(pred_scores, target_scores.to(dtype)).sum() / target_scores_sum  # BCE VFLoss Focal
        elif isinstance(self.bce, SlideLoss):
            if fg_mask.sum():
                auto_iou = bbox_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=False, CIoU=True).mean()
            else:
                auto_iou = 0.1
            loss[1] = self.bce(pred_scores, target_scores.to(dtype), auto_iou).sum() / target_scores_sum  # SlideLoss
        elif isinstance(self.bce, QualityfocalLoss):
            if fg_mask.sum():
                pos_ious = bbox_iou(pred_bboxes, target_bboxes / stride_tensor, xywh=False).clamp(min=1e-6).detach()
                # 10.0x Faster than torch.one_hot
                targets_onehot = torch.zeros((target_labels.shape[0], target_labels.shape[1], self.nc),
                                             dtype=torch.int64,
                                             device=target_labels.device)  # (b, h*w, 80)
                targets_onehot.scatter_(2, target_labels.unsqueeze(-1), 1)
                cls_iou_targets = pos_ious * targets_onehot
                fg_scores_mask = fg_mask[:, :, None].repeat(1, 1, self.nc)  # (b, h*w, 80)
                targets_onehot_pos = torch.where(fg_scores_mask > 0, targets_onehot, 0)
                cls_iou_targets = torch.where(fg_scores_mask > 0, cls_iou_targets, 0)
            else:
                cls_iou_targets = torch.zeros((target_labels.shape[0], target_labels.shape[1], self.nc),
                                              dtype=torch.int64,
                                              device=target_labels.device)  # (b, h*w, 80)
                targets_onehot_pos = torch.zeros((target_labels.shape[0], target_labels.shape[1], self.nc),
                                                 dtype=torch.int64,
                                                 device=target_labels.device)  # (b, h*w, 80)
            loss[1] = self.bce(pred_scores, cls_iou_targets.to(dtype), targets_onehot_pos.to(torch.bool)).sum() / max(
                fg_mask.sum(), 1)
        else:
            loss[1] = self.bce(pred_scores, target_scores.to(dtype)).sum() / target_scores_sum  # 确保有损失可用

        # Bbox loss
        if fg_mask.sum():
            target_bboxes /= stride_tensor
            loss[0], loss[2] = self.bbox_loss(pred_distri, pred_bboxes, anchor_points, target_bboxes, target_scores,
                                              target_scores_sum, fg_mask)

        loss[0] *= self.hyp.box  # box gain
        loss[1] *= self.hyp.cls  # cls gain
        loss[2] *= self.hyp.dfl  # dfl gain

        return loss.sum() * batch_size, loss.detach()  # loss(box, cls, dfl)

四、本文总结

到此本文的正式分享内容就结束了，在这里给大家推荐我的YOLOv11改进有效涨点专栏，本专栏目前为新开的平均质量分98分，后期我会根据各种最新的前沿顶会进行论文复现，也会对一些老的改进机制进行补充，如果大家觉得本文帮助到你了，订阅本专栏，关注后续更多的更新~

希望大家阅读完以后可以给文章点点赞和评论支持一下这样购买专栏的人越多群内人越多大家交流的机会就更多了。

专栏回顾：YOLOv11改进系列专栏——本专栏持续复习各种顶会内容——科研必备