机器学习HW10对抗性攻击

一、任务描述
二、算法
三、实验

一、任务描述

我们使用pytorchcv来获得CIFAR-10预训练模型，所以我们需要首先设置环境。我们还需要下载我们想要攻击的数据(200张图片)。我们将首先通过ToTensor对原始像素值(0-255标度)应用变换(到0-1标度)，然后归一化(减去均值除以标准差)。
在这里插入图片描述

# the mean and std are the calculated statistics from cifar_10 dataset
cifar_10_mean = (0.491, 0.482, 0.447) # mean for the three channels of cifar_10 images
cifar_10_std = (0.202, 0.199, 0.201) # std for the three channels of cifar_10 images

# convert mean and std to 3-dimensional tensors for future operations
mean = torch.tensor(cifar_10_mean).to(device).view(3, 1, 1)
std = torch.tensor(cifar_10_std).to(device).view(3, 1, 1)

epsilon = 8/255/std

ToTensor()还会将图像从形状(高度、宽度、通道)转换为形状(通道、高度、宽度)，所以我们还需要将形状转置回原始形状。
由于我们的数据加载器对一批数据进行采样，所以我们这里需要使用np.transpose将(batch_size，channel，height，width)转置回(batch_size，height，width，channel)。

# perform adversarial attack and generate adversarial examples
def gen_adv_examples(model, loader, attack, loss_fn):
    model.eval()
    adv_names = []
    train_acc, train_loss = 0.0, 0.0
    for i, (x, y) in enumerate(loader):
        x, y = x.to(device), y.to(device)
        x_adv = attack(model, x, y, loss_fn) # obtain adversarial examples
        yp = model(x_adv)
        loss = loss_fn(yp, y)
        train_acc += (yp.argmax(dim=1) == y).sum().item()
        train_loss += loss.item() * x.shape[0]
        # store adversarial examples
        adv_ex = ((x_adv) * std + mean).clamp(0, 1) # to 0-1 scale
        adv_ex = (adv_ex * 255).clamp(0, 255) # 0-255 scale
        adv_ex = adv_ex.detach().cpu().data.numpy().round() # 四舍五入以去除小数部分
        adv_ex = adv_ex.transpose((0, 2, 3, 1)) # transpose (bs, C, H, W) back to (bs, H, W, C)
        adv_examples = adv_ex if i == 0 else np.r_[adv_examples, adv_ex]
    return adv_examples, train_acc / len(loader.dataset), train_loss / len(loader.dataset)

集成多个模型作为您的代理模型，以增加黑盒可移植性

class ensembleNet(nn.Module):
    def __init__(self, model_names):
        super().__init__()
        self.models = nn.ModuleList([ptcv_get_model(name, pretrained=True) for name in model_names])
        self.softmax = nn.Softmax(dim=1)
    def forward(self, x):
        for i, m in enumerate(self.models):
        # TODO: sum up logits from multiple models  
        # return ensemble_logits
            emsemble_logits = m(x) if i == 0 else emsemble_logits + m(x)
        return emsemble_logits/len(self.models)

先决条件
○攻击目标：非目标攻击
○攻击约束：L-infinity和参数ε
○攻击算法： FGSM/I-FGSM
○攻击模式：黑盒攻击（对代理网络执行攻击）
在这里插入图片描述

1.选择任意一个代理网络来攻击TA 中的黑盒模型。
2.实现非目标对抗性攻击方法
a.FGSM
b.I- FGSM
c.MI-FGSM
3.通过不同的输入（DIM）来增加攻击的可转移性
4.攻击多个代理模型-集成攻击（Ensemble attack）

二、算法

1、FGSM

●快速梯度符号法Fast Gradient Sign Method（FGSM）
FGSM是一种基于梯度生成对抗样本的算法，属于对抗攻击中的无目标攻击（即不要求对抗样本经过model预测指定的类别，只要与原样本预测的不一样即可）。我们在理解简单的dp网络结构的时候，在求损失函数最小值，我们会沿着梯度的反方向移动，使用减号，也就是所谓的梯度下降算法；而FGSM可以理解为梯度上升算法，也就是使用加号，使得损失函数最大化。SIGN 函数用于返回数字的符号。当数字大于 0 时返回 1，等于 0 时返回 0，小于0 时返回 -1。 x 是原始样本，θ 是模型的权重参数（即w），y是x的真实类别。输入原始样本，权重参数以及真实类别，通过 J 损失函数求得神经网络的损失值，∇x 表示对 x 求偏导，即损失函数 J 对 x 样本求偏导。ϵ（epsilon）的值通常是人为设定，可以理解为学习率，一旦扰动值超出阈值，该对抗样本会被人眼识别。在这里插入图片描述

# perform fgsm attack
def fgsm(model, x, y, loss_fn, epsilon=epsilon):
    x_adv = x.detach().clone() # initialize x_adv as original benign image x
    x_adv.requires_grad = True # need to obtain gradient of x_adv, thus set required grad
    loss = loss_fn(model(x_adv), y) # calculate loss
    loss.backward() # calculate gradient
    # fgsm: use gradient ascent on x_adv to maximize loss
    grad = x_adv.grad.detach()
    x_adv = x_adv + epsilon * grad.sign()
    return x_adv

2、I-FGSM

● Iterative Fast Gradient Sign Method (I-FGSM)
在这里插入图片描述

# alpha and num_iter can be decided by yourself
alpha = 0.8/255/std
def ifgsm(model, x, y, loss_fn, epsilon=epsilon, alpha=alpha, num_iter=20):
    x_adv = x
    # write a loop of num_iter to represent the iterative times
    for i in range(num_iter):
        # x_adv = fgsm(model, x_adv, y, loss_fn, alpha) # call fgsm with (epsilon = alpha) to obtain new x_adv
        x_adv = x_adv.detach().clone()
        x_adv.requires_grad = True # need to obtain gradient of x_adv, thus set required grad
        loss = loss_fn(model(x_adv), y) # calculate loss
        loss.backward() # calculate gradient
        # fgsm: use gradient ascent on x_adv to maximize loss
        grad = x_adv.grad.detach()
        x_adv = x_adv + alpha * grad.sign()

        x_adv = torch.max(torch.min(x_adv, x+epsilon), x-epsilon) # clip new x_adv back to [x-epsilon, x+epsilon]
    return x_adv

3、MI-FGSM

用动量增强对抗性攻击，使用动量来稳定更新方向和逃避糟糕的局部最大值
在这里插入图片描述

def mifgsm(model, x, y, loss_fn, epsilon=epsilon, alpha=alpha, num_iter=20, decay=1.0):
    x_adv = x
    # initialze momentum tensor
    momentum = torch.zeros_like(x).detach().to(device)
    # write a loop of num_iter to represent the iterative times
    for i in range(num_iter):
        x_adv = x_adv.detach().clone()
        x_adv.requires_grad = True # need to obtain gradient of x_adv, thus set required grad
        loss = loss_fn(model(x_adv), y) # calculate loss
        loss.backward() # calculate gradient
        # TODO: Momentum calculation
        # grad = .....
        # TODO: Momentum calculation
        grad = x_adv.grad.detach()
        grad = decay * momentum + grad/(grad.abs().sum() + 1e-8)
        momentum = grad
        x_adv = x_adv + alpha * grad.sign()
        x_adv = torch.max(torch.min(x_adv, x+epsilon), x-epsilon) # clip new x_adv back to [x-epsilon, x+epsilon]
    return x_adv

过拟合也发生在对抗性攻击中……
●IFGSM贪婪地干扰图像的方向符号，损失梯度容易落入贫穷的局部极大值和过拟合特定的网络参数。
●这些过拟合对抗的例子很少转移到黑盒模型。
如何防止过拟合代理模型，增加黑盒攻击的可转移性？数据增强

4、多种输入（DIM）

1.随机调整大小（将输入图像的大小调整为随机大小）
2.随机填充（以随机的方式在输入图像周围填充零）
在这里插入图片描述

def dmi_mifgsm(model, x, y, loss_fn, epsilon=epsilon, alpha=alpha, num_iter=50, decay=1.0, p=0.5):
    x_adv = x
    # initialze momentum tensor
    momentum = torch.zeros_like(x).detach().to(device)
    # write a loop of num_iter to represent the iterative times
    for i in range(num_iter):
        x_adv = x_adv.detach().clone()
        x_adv_raw = x_adv.clone()
        if torch.rand(1).item() >= p:
            #resize img to rnd X rnd
            rnd = torch.randint(29, 33, (1,)).item()
            x_adv = transforms.Resize((rnd, rnd))(x_adv)
            #padding img to 32 X 32 with 0
            left = torch.randint(0, 32 - rnd + 1, (1,)).item()
            top = torch.randint(0, 32 - rnd + 1, (1,)).item()
            right = 32 - rnd - left
            bottom = 32 - rnd - top
            x_adv = transforms.Pad([left, top, right, bottom])(x_adv)
        x_adv.requires_grad = True # need to obtain gradient of x_adv, thus set required grad
        loss = loss_fn(model(x_adv), y) # calculate loss
        loss.backward() # calculate gradient
        # TODO: Momentum calculation
        # grad = .....   
        grad = x_adv.grad.detach()
        grad = decay * momentum + grad/(grad.abs().sum() + 1e-8)
        momentum = grad
        x_adv = x_adv_raw + alpha * grad.sign()
        x_adv = torch.max(torch.min(x_adv, x+epsilon), x-epsilon) # clip new x_adv back to [x-epsilon, x+epsilon]
    return x_adv

集成攻击
●选择代理模型的列表
●选择一个攻击算法（FGSM I-FGSM等等）
●攻击多个代理模型同时
●集成对抗攻击：研究可转移对抗的例子和黑盒攻击
●如何选择合适的代理模型黑盒攻击：

评估指标

●参数ε固定为8
●距离测量： L-inf. norm
●模型精度是唯一的评价指标
在这里插入图片描述

三、实验

1、Simple Baseline

直接运行助教程式中的FGSM方法。FGSM只对图片进行一次攻击，代理模型（proxy models）是resnet110_cifar10，在被攻击图片中的精度benign_acc=0.95, benign_loss=0.22678。在攻击中，使用gen_adv_examples函数调用fgsm函数，精度降低：fgsm_acc=0.59， fgsm_loss=2.49186。
在这里插入图片描述

2、Medium Baseline

方法：I-FGSM方法 + Ensembel Attack。ifgsm方法相比与fgsm相比，使用了多次的fgsm循环攻击。另外使用了Ensemble attack，该方法使用多个代理模型攻击，需要改动ensembelNet这个类中的forward函数。在攻击中，使用gen_adv_examples函数调用emsebel_model和ifgsm，精度降低明显：ifgsm_acc = 0.01, ifgsm_loss=17.29498。

    def forward(self, x):
        emsemble_logits = None
        for i, m in enumerate(self.models):
            emsemble_logits = m(x) if i == 0 else emsemble_logits + m(x)
        return emsemble_logits/len(self.models)

在这里插入图片描述

源模型是“resnet110_cifar10”，对dog2.png应用普通fgsm攻击：

通过imgaug包进行JPEG压缩，压缩率设置为70：

3、Strong Baseline

方法：MIFGSM + Ensemble Attack(pick right models)。mifgsm相比于ifgsm，加入了momentum，避免攻击陷入local maxima。在medium baseline中，我们随机挑选了一些代理模型，这样很盲目。可以选择一些训练不充分的模型，训练不充分的意思包括两方面：一是模型的训练epoch少，二是模型在验证集（val set）未达到最小loss。依据论文使用https://github.com/kuangliu/pytorch-cifar中的训练方法，选择resnet18模型，训练30个epoch（正常训练到达最好结果大约需要200个epoch），将其加入ensmbleNet中。攻击后的精度和loss:ensemble_mifgsm_acc = 0.01, emsemble_mifgsm_loss = 12.13276。
在这里插入图片描述

        # TODO: Momentum calculation
        grad = x_adv.grad.detach()
        grad = decay * momentum + grad/(grad.abs().sum() + 1e-8)
        momentum = grad
        x_adv = x_adv + alpha * grad.sign()

4、Boss Baseline

方法：DIM-MIFGSM + Ensemble Attack(pick right models)。相对于strong baseline，将mifgsm替换为dim-mifgsm，对被攻击图片加入了transform来避免过拟合。transform是先随机的resize图片，然后随机padding图片到原大小。在mifgsm函数的基础上写dim_mifgsm函数，在攻击中，使用gen_adv_examples函数调用模型，攻击后的精度和loss:ensemble_dmi_mifgsm_acc = 0.00, emsemble_dim_mifgsm_loss = 13.64031。
在这里插入图片描述

def dmi_mifgsm(model, x, y, loss_fn, epsilon=epsilon, alpha=alpha, num_iter=50, decay=1.0, p=0.5):
    x_adv = x
    # initialze momentum tensor
    momentum = torch.zeros_like(x).detach().to(device)
    # write a loop of num_iter to represent the iterative times
    for i in range(num_iter):
        x_adv = x_adv.detach().clone()
        x_adv_raw = x_adv.clone()
        if torch.rand(1).item() >= p:
            #resize img to rnd X rnd
            rnd = torch.randint(29, 33, (1,)).item()
            x_adv = transforms.Resize((rnd, rnd))(x_adv)
            #padding img to 32 X 32 with 0
            left = torch.randint(0, 32 - rnd + 1, (1,)).item()
            top = torch.randint(0, 32 - rnd + 1, (1,)).item()
            right = 32 - rnd - left
            bottom = 32 - rnd - top
            x_adv = transforms.Pad([left, top, right, bottom])(x_adv)
        x_adv.requires_grad = True # need to obtain gradient of x_adv, thus set required grad
        loss = loss_fn(model(x_adv), y) # calculate loss
        loss.backward() # calculate gradient
        # TODO: Momentum calculation
        # grad = .....   
        grad = x_adv.grad.detach()
        grad = decay * momentum + grad/(grad.abs().sum() + 1e-8)
        momentum = grad
        x_adv = x_adv_raw + alpha * grad.sign()
        x_adv = torch.max(torch.min(x_adv, x+epsilon), x-epsilon) # clip new x_adv back to [x-epsilon, x+epsilon]
    return x_adv

机器学习HW10对抗性攻击

机器学习HW10对抗性攻击

一、任务描述

二、算法

1、FGSM

2、I-FGSM

3、MI-FGSM

4、多种输入（DIM）

评估指标

三、实验

1、Simple Baseline

2、Medium Baseline

3、Strong Baseline

4、Boss Baseline

猜你喜欢