AIMET API 文档（5）

1.1.7.6 代码示例 - 自适应舍入 (AdaRound)

此示例演示如何使用 AIMET 执行自适应舍入 (AdaRound)。

加载模型

对于此示例，我们将从 torchvision 加载预训练的 ResNet18 模型。同样，您可以加载任何预训练的 PyTorch 模型。

    import torch
    from torchvision import models

    model = models.resnet18(pretrained=True).eval()

准备量化模拟模型

AIMET量化模拟要求用户的模型定义遵循一定的准则。例如，前向传递中定义的函数应更改为等效的 torch.nn.Module。 AIMET 用户指南列出了所有这些指南。以下 ModelPreparer API 使用 PyTorch 1.9+ 版本中提供的新图形转换功能，并自动执行符合上述准则所需的模型定义更改。

更多详情请参考：模型准备器API：

    from aimet_torch.model_preparer import prepare_model
    prepared_model = prepare_model(model)

应用 AdaRound

我们现在可以将 AdaRound 应用于该模型。

AdaRound 的一些参数如下所述

dataloader：AdaRound需要一个dataloader来使用数据样本进行逐层优化来学习舍入向量。可以传入训练或验证数据加载器。
num_batches：计算量化编码时用于评估模型的批次数。通常我们希望 AdaRound 使用大约 2000 个样本。因此，批量大小为 32 时，这可能会转化为 64 个批量。为了加快执行速度，我们使用批量大小 1。
default_num_iterations：每层周围的迭代次数。默认值设置为 10000，我们强烈建议不要减少此数字。但在本例中我们使用 32 来加速执行运行时间。

 from aimet_common.defs import QuantScheme
    from aimet_torch.quantsim import QuantizationSimModel
    from aimet_torch.adaround.adaround_weight import Adaround, AdaroundParameters

    # User action required
    # The following line of code is an example of how to use the ImageNet data's training data loader.
    # Replace the following line with your own dataset's training data loader.
    data_loader = ImageNetDataPipeline.get_train_dataloader()

    params = AdaroundParameters(data_loader=data_loader, num_batches=4, default_num_iterations=32,
                                default_reg_param=0.01, default_beta_range=(20, 2))

    input_shape = (1, 3, 224, 224)
    dummy_input = torch.randn(input_shape)

    # Returns model with adarounded weights and their corresponding encodings
    adarounded_model = Adaround.apply_adaround(prepared_model, dummy_input, params, path='./',
                                               filename_prefix='resnet18', default_param_bw=4,
                                               default_quant_scheme=QuantScheme.post_training_tf_enhanced,
                                               default_config_file=None)

创建量化仿真模型

现在我们使用 AdaRounded 模型并创建一个 QuantizationSimModel。这基本上意味着 AIMET 将在模型图中插入假量化操作并配置它们。这里解释一些参数：

default_param_bw：创建 QuantizationSimModel 时必须使用与创建的 apply_adaround() 中使用的相同参数位宽精度。
Freezing the parameter encodings：创建QuantizationSimModel后，必须先调用set_and_freeze_param_encodings()接口，然后再调用compute_encodings()接口。应用 AdaRound 时，参数值已根据内部创建的这些初始编码向上或向下舍入。为了量化模拟的准确性，冻结这些编码非常重要。如果参数编码未冻结，则对compute_encodings()的调用将改变参数编码的值，并且量化模拟精度将不正确。

    sim = QuantizationSimModel(adarounded_model, quant_scheme=quant_scheme, default_param_bw=param_bw,
                               default_output_bw=output_bw, dummy_input=dummy_input)

    # Set and freeze encodings to use same quantization grid and then invoke compute encodings
    sim.set_and_freeze_param_encodings(encoding_path='./resnet18.encodings')

从compute_encodings()回调的用户创建的示例函数

尽管 AIMET 已将“量化器”节点添加到模型图中，但该模型尚未准备好使用。在我们使用 sim 模型进行推理或训练之前，我们需要为每个“量化器”节点找到适当的比例/偏移量化参数。对于激活量化节点，我们需要通过模型传递未标记的数据样本来收集范围统计数据，然后让 AIMET 计算适当的比例/偏移量化参数。该过程有时称为校准。 AIMET 将其简单地称为“计算编码”。

因此，我们创建一个例程来通过模型传递未标记的数据样本。这应该相当简单 - 使用现有的训练或验证数据加载器来提取一些样本并将其传递给模型。我们不需要计算任何损失指标等。因此我们可以为此目的忽略模型输出。关于数据样本的一些提示

在实践中，我们需要整个数据样本的一小部分来计算编码。例如，ImageNet 的训练数据集有 1M 个样本。对于计算编码，我们只需要 500 或 1000 个样本。

如果用于计算编码的样本分布良好，这可能是有益的。不需要覆盖所有类等，因为我们只查看每层激活时的值范围。然而，我们绝对希望避免极端的情况，例如使用所有“黑暗”或“明亮”样本 - 例如，仅使用夜间拍摄的照片可能无法给出理想的结果。

def pass_calibration_data(sim_model):
    """
    The User of the QuantizationSimModel API is expected to write this function based on their data set.
    This is not a working function and is provided only as a guideline.

    :param sim_model:
    :return:
    """

    # User action required
    # The following line is an example of how to use the ImageNet data's validation data loader.
    # Replace the following line with your own dataset's validation data loader.
    data_loader = ImageNetDataPipeline.get_val_dataloader()

    # User action required
    # For computing the activation encodings, around 1000 unlabelled data samples are required.
    # Edit the following 2 lines based on your batch size.
    # batch_size * max_batch_counter should be 1024
    batch_size = 64
    max_batch_counter = 16

    sim_model.eval()

    current_batch_counter = 0
    with torch.no_grad():
        for input_data, target_data in data_loader:

            inputs_batch = input_data  # labels are ignored
            sim_model(inputs_batch)

            current_batch_counter += 1
            if current_batch_counter == max_batch_counter:
                break

计算量化编码
现在我们调用 AIMET 使用上述例程将数据传递给模型，然后计算量化编码。这里的编码是指比例/偏移量化参数。

 sim.compute_encodings(pass_calibration_data, forward_pass_callback_args=None)

确定模拟精度

现在 QuantizationSim 模型已准备好用于推理。首先，我们可以将此模型传递给评估例程。评估例程现在将为我们提供 INT8 量化的模拟量化精度分数。

    accuracy = ImageNetDataPipeline.evaluate(sim.model, use_cuda)
    print(accuracy)

导出模型

所以我们在AdaRound之后有了一个改进的模型。现在下一步就是实际将该模型用于目标。为此，我们需要导出具有更新权重的模型，而不使用虚假的量化操作。我们还导出自使用 QAT 以来在训练期间更新的编码（比例/偏移量化参数）。 AIMET QuantizationSimModel 为此提供了一个导出 API。

    # Export the model which saves pytorch model without any simulation nodes and saves encodings file for both
    # activations and parameters in JSON format
    sim.export(path='./', filename_prefix='quantized_resnet18', dummy_input=dummy_input.cpu())

1.1.8 跨层均衡 API

1.1.8.1 用户指南链接

要了解有关此技术的更多信息，请参阅跨层均衡

1.1.8.2 示例笔记本链接

有关展示如何使用 PyTorch 跨层均衡的端到端笔记本，请参阅此处。

1.1.8.3 介绍

PyTorch 跨层均衡的 AIMET 功能具有 3 个功能 -

批量标准化折叠
跨层缩放
高偏置折叠

1.1.8.4 跨层均衡 API

以下 API 执行 BatchNorm 折叠，然后执行跨层缩放，然后执行高偏差折叠。

注意：如果模型没有 BatchNorm 层，使用以下 API 时不会发生 High Bias 折叠

用于跨层均衡的 API

aimet_torch.cross_layer_equalization.equalize_model(model, input_shapes, dummy_input=None)[source]

用于在给定模型上执行跨层均衡 (CLE) 的高级 API。模型就地均衡。

参数：

model (Module) – 均衡模型
input_shapes (Union[Tuple, List[Tuple]]) – 输入的形状（如果有多个输入，可以是元组或元组列表）
dummy_input (Union[Tensor, Tuple, None]) – 模型的虚拟输入。可以是张量或张量元组

返回： 空

1.1.8.5 代码示例

所需导入

from torchvision import models
from aimet_torch.cross_layer_equalization import equalize_model

自动模式下的跨层均衡

def cross_layer_equalization_auto():
    model = models.resnet18(pretrained=True)

    input_shape = (1, 3, 224, 224)

    model = model.eval()

    # Performs BatchNorm fold, Cross layer scaling and High bias folding
    equalize_model(model, input_shape)

1.1.8.6 原始 API

1.1.8.6.1 介绍

如果用户想要修改跨层均衡的顺序，而不是使用某些功能或手动调整需要均衡的层列表，可以使用以下 API。

更高级别的 API 可用于依次使用一个或多个功能。它会自动找到要折叠或缩放的图层。

较低级别的 API 可用于手动调整要折叠的层列表。用户必须按照层在模型中出现的正确顺序传递层列表。

注意：在使用 High Bias Fold 之前，需要应用跨层缩放 (CLS)，并且需要将从 CLS 获得的缩放因子插入到 High Bias Fold 中。而且，如果存在批量归一化层，则需要折叠它们并保存信息以插入高偏差折叠 API。

1.1.8.6.2 ClsSetInfo 定义

class aimet_torch.cross_layer_equalization.ClsSetInfo(cls_pair_1, cls_pair_2=None)[source]

此类保存有关 CLS 集合中层的信息，以及相应的缩放因子和其他信息，例如 CLS 集合层之间是否存在 ReLU 激活函数

如果深度可分离层被折叠，构造函数需要 2 对

参数：

cls_pair_1 (ClsSetLayerPairInfo) – 两个卷积或卷积和深度卷积之间的配对
cls_pair_2 (Optional[ClsSetLayerPairInfo]) – 深度卷积和逐点卷积之间的配对

class ClsSetLayerPairInfo(layer1, layer2, scale_factor, relu_activation_between_layers)[source]

对使用 CLS 缩放的一对层进行建模。以及相关信息。

参数：

layer1 (Conv2d) – 偏置被折叠的层
layer2 (Conv2d) – 前一层的偏差被折叠到的层
scale_factor (ndarray) – 从 Cross Layer Scaling 中找到比例因子来缩放 BN 参数
relu_activation_between_layers (bool) – 如果layer1和layer2之间的激活是Relu

1.1.8.6.3 用于跨层均衡的更高级别 API

用于批量归一化折叠的 API

aimet_torch.batch_norm_fold.fold_all_batch_norms(model, input_shapes, dummy_input=None)

将模型中的所有batch_norm层折叠到相应转换层的权重中

参数：

model (Module) – 模型
input_shapes (Union[Tuple, List[Tuple]]) – 模型的输入形状（可以是一个或多个输入）
dummy_input (Union[Tensor, Tuple, None]) – 模型的虚拟输入。可以是张量或张量元组

返回类型： List[Tuple[Union[Linear, Conv1d, Conv2d, ConvTranspose2d], Union[BatchNorm1d, BatchNorm2d]]]
返回： 层对列表 [(Conv/Linear, 折叠的 BN 层)]

用于跨层扩展的 API

aimet_torch.cross_layer_equalization.CrossLayerScaling.scale_model(model, input_shapes, dummy_input=None)

使用跨层缩放来缩放给定模型中的所有适用层

参数：

model (Module) – 模型按比例缩放
input_shapes (Union[Tuple, List[Tuple]]) – 模型的输入形状（可以是一个或多个输入）
dummy_input (Union[Tensor, List[Tensor], None]) – 将输入传递给模型。用于解析模型图。用户应将张量放置在适当的设备上。

返回类型： List[ClsSetInfo]
返回： 每个 CLS 集的 CLS 信息

高偏置折叠 API

aimet_torch.cross_layer_equalization.HighBiasFold.bias_fold(cls_set_info_list, bn_layers)

将大于 3 * sigma 的偏差值折叠到下一层的偏差

参数：

cls_set_info_list (List[ClsSetInfo]) – 每个 cls 集的信息元素列表
bn_layers (Dict[Union[Conv2d, ConvTranspose2d], BatchNorm2d]) – Key：Conv/Linear层 Value：对应的折叠BN层

返回： 空

1.1.8.6.4 更高级别 API 的代码示例

所需导入

import torch
from torchvision import models
from aimet_torch import batch_norm_fold
from aimet_torch import cross_layer_equalization
from aimet_torch import utils

自动模式下的跨层均衡调用每个 API

def cross_layer_equalization_auto_step_by_step():
    model = models.resnet18(pretrained=True)

    model = model.eval()
    input_shape = (1, 3, 224, 224)
    # Fold BatchNorm layers
    folded_pairs = batch_norm_fold.fold_all_batch_norms(model, input_shape)
    bn_dict = {
    
    }
    for conv_bn in folded_pairs:
        bn_dict[conv_bn[0]] = conv_bn[1]

    # Replace any ReLU6 layers with ReLU
    utils.replace_modules_of_type1_with_type2(model, torch.nn.ReLU6, torch.nn.ReLU)

    # Perform cross-layer scaling on applicable layer sets
    cls_set_info_list = cross_layer_equalization.CrossLayerScaling.scale_model(model, input_shape)

    # Perform high-bias fold
    cross_layer_equalization.HighBiasFold.bias_fold(cls_set_info_list, bn_dict)

1.1.8.6.5 用于跨层均衡的较低级别 API

用于批量归一化折叠的 API

aimet_torch.batch_norm_fold.fold_given_batch_norms(model, layer_pairs)[source]

将一组给定的batch_norm层折叠到卷积层中

参数：

model – 模型
layer_pairs – 用于折叠的成对的 conv 和 batch_norm 层

返回： 空

用于跨层扩展的 API

aimet_torch.cross_layer_equalization.CrossLayerScaling.scale_cls_sets(cls_sets)

缩放多个 CLS 集

参数： cls_sets (List[Union[Tuple[Conv2d, Conv2d], Tuple[Conv2d, Conv2d, Conv2d]]]) – CLS 集列表
返回类型： List[Union[ndarray, Tuple[ndarray]]]
返回： 按顺序计算并应用于每个 CLS 集的缩放因子

高偏置折叠 API

aimet_torch.cross_layer_equalization.HighBiasFold.bias_fold(cls_set_info_list, bn_layers)

将大于 3 * sigma 的偏差值折叠到下一层的偏差

参数：

cls_set_info_list (List[ClsSetInfo]) – 每个 cls 集的信息元素列表
bn_layers (Dict[Union[Conv2d, ConvTranspose2d], BatchNorm2d]) – Key：Conv/Linear层 Value：对应的折叠BN层

返回： 空

1.1.8.6.6 较低级别 API 的代码示例

所需导入

from torchvision import models
def cross_layer_equalization_manual():
from aimet_torch import batch_norm_fold
from aimet_torch import cross_layer_equalization
from aimet_torch import utils

手动模式下的跨层均衡

def cross_layer_equalization_manual():
    model = models.resnet18(pretrained=True)

    model = model.eval()

    # Batch Norm Fold
    # Create a list of conv/linear and BN layers for folding forward or backward
    layer_list = [(model.conv1, model.bn1),
                  (model.layer1[0].conv1, model.layer1[0].bn1)]

    # Save the corresponding BN layers (needed only for high bias folding)
    bn_dict = {
    
    }
    for conv_bn in layer_list:
        bn_dict[conv_bn[0]] = conv_bn[1]

    batch_norm_fold.fold_given_batch_norms(model, layer_list)

    # Replace any ReLU6 layers with ReLU
    utils.replace_modules_of_type1_with_type2(model, torch.nn.ReLU6, torch.nn.ReLU)

    # Cross Layer Scaling
    # Create a list of consecutive conv layers to be equalized
    consecutive_layer_list = [(model.conv1, model.layer1[0].conv1),
                              (model.layer1[0].conv1, model.layer1[0].conv2)]

    scaling_factor_list = cross_layer_equalization.CrossLayerScaling.scale_cls_sets(consecutive_layer_list)

    # High Bias Fold
    # Create a list of consecutive conv layers whose previous layers bias has to be folded to next layers bias
    ClsSetInfo = cross_layer_equalization.ClsSetInfo
    ClsPairInfo = cross_layer_equalization.ClsSetInfo.ClsSetLayerPairInfo
    cls_set_info_list = [ClsSetInfo(ClsPairInfo(model.conv1, model.layer1[0].conv1, scaling_factor_list[0], True)),
                         ClsSetInfo(ClsPairInfo(model.layer1[0].conv1, model.layer1[0].conv2, scaling_factor_list[1], True))]

    cross_layer_equalization.HighBiasFold.bias_fold(cls_set_info_list, bn_dict)

深度可分离层手动模式下的跨层均衡

def cross_layer_equalization_depthwise_layers():
    model = MobileNetV2().to(torch.device('cpu'))
    model.eval()
    # Batch Norm Fold
    # Create a list of conv/linear and BN layers for folding forward or backward
    layer_list = [(model.features[0][0], model.features[0][1]),
                  (model.features[1].conv[0], model.features[1].conv[1]),
                  (model.features[1].conv[3], model.features[1].conv[4])]

    # Save the corresponding BN layers (needed only for high bias folding)
    bn_dict = {
    
    }
    for conv_bn in layer_list:
        bn_dict[conv_bn[0]] = conv_bn[1]

    batch_norm_fold.fold_given_batch_norms(model, layer_list)

    # Replace any ReLU6 layers with ReLU
    utils.replace_modules_of_type1_with_type2(model, torch.nn.ReLU6, torch.nn.ReLU)

    # Cross Layer Scaling
    # Create a list of consecutive conv layers to be equalized
    consecutive_layer_list = [(model.features[0][0], model.features[1].conv[0], model.features[1].conv[3])]
    scaling_factor_list = cross_layer_equalization.CrossLayerScaling.scale_cls_sets(consecutive_layer_list)

    # High Bias Fold
    # Create a list of consecutive conv layers whose previous layers bias has to be folded to next layers bias
    ClsSetInfo = cross_layer_equalization.ClsSetInfo
    ClsPairInfo = cross_layer_equalization.ClsSetInfo.ClsSetLayerPairInfo
    cls_set_info_list = [ClsSetInfo(ClsPairInfo(model.features[0][0], model.features[1].conv[0], scaling_factor_list[0][0], True)),
                         ClsSetInfo(ClsPairInfo(model.features[1].conv[0], model.features[1].conv[3], scaling_factor_list[0][1], True))]

    cross_layer_equalization.HighBiasFold.bias_fold(cls_set_info_list, bn_dict)