Li Hongyi Machine Learning Homework 10 - Adversarial Attack, FGSM, IFGSM

For the theoretical part, see Li Hongyi's Machine Learning - Adversarial Attack_iwill323's Blog - CSDN Blog

Table of contents

goals and methods

Evaluation method

Guide package

Global Settings

Data

transform

Dataset

Proxy Model and Target Model

Evaluate the performance of the target model on non-aggressive images

Attack Algorithm

FGSM

I-FGSM

MI-FGSM

Diverse Input (DIM)

attack function

generate attack image function

Ensemble Attack

Integrated model function

Build an ensemble model

Visualize attack results

attack

FGSM method

I-FGSM method + Ensembel Attack

MIFGSM + Ensemble Attack(pick right models)

DIM-MIFGSM + Ensemble Attack(pick right models)

Passive Defense—JPEG Compression

attack

defense

Extension: file reading


goals and methods

Use the training data of the target network to train one or some proxy networks (this job does not need to be trained, just use a trained model), treat the proxy network as the target of attack, and use the proxy network to generate offensive input. It is a white box attack on the proxy network, and then input the trained picture into the Network whose parameters are unknown, and the attack is realized.

○ Attack objective: Non-targeted attack

○ Attack algorithm: FGSM/I-FGSM

○ Attack schema: Black box attack (perform attack on proxy network)

○ Increase attack transferability by Diverse input (DIM)

○ Attack more than one proxy model - Ensemble attack

If you are not a student of National Taiwan University for this assignment, you will not be able to see your submission results and actual scores

Evaluation method

The pixel value of the image is 0-255. In this job, the maximum pixel size ε to be changed is limited to 8, so that the change of the image is not too obvious. If ε is equal to 16, then the image change is more obvious

○ ε is fixed to 8
○ Distance measurement: L-inf. norm
○ Model accuracy (decline) is the only evaluation criterion

Guide package

import torch
import torch.nn as nn
import torchvision
import os
import glob
import shutil
import numpy as np
from PIL import Image
from torchvision.transforms import transforms
from torch.utils.data import Dataset, DataLoader
import matplotlib.pyplot as plt

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
batch_size = 8

Global Settings

Mainly the mean mean and standard deviation std used for image standardization, and ε. ε is to be divided by 255 and std, explained as follows

benign images: images which do not contain adversarial perturbations
adversarial images: images which include adversarial perturbations

# the mean and std are the calculated statistics from cifar_10 dataset
cifar_10_mean = (0.491, 0.482, 0.447) # mean for the three channels of cifar_10 images
cifar_10_std = (0.202, 0.199, 0.201) # std for the three channels of cifar_10 images

# convert mean and std to 3-dimensional tensors for future operations
mean = torch.tensor(cifar_10_mean).to(device).view(3, 1, 1)
std = torch.tensor(cifar_10_std).to(device).view(3, 1, 1)

epsilon = 8/255/std

root = './data' # directory for storing benign images

Data

transform

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(cifar_10_mean, cifar_10_std)
])

Dataset

 It can be downloaded from Li Hongyi's 2022 Machine Learning HW10 Analysis_Machine Learning Craftsman's Blog-CSDN Blog , a total of 200 pictures, divided into 10 folders, and each category has 20 pictures.

        data_dir
        ├── class_dir
        │   ├── class1.png
        │   ├── ...
        │   ├── class20.png

Seeing this directory structure, you can find that you can use the ImageFolder function. ImageFolder function can refer to data set reading and division_iwill323's blog-CSDN blog

adv_set = torchvision.datasets.ImageFolder(os.path.join(root), transform=transform) 
adv_loader = DataLoader(adv_set, batch_size=batch_size, shuffle=False)

What's interesting is that the original code customizes a Dataset function, which is short and concise and worth learning

class AdvDataset(Dataset):
    def __init__(self, data_dir, transform):
        self.images = []
        self.labels = []
        self.names = []
        '''
        data_dir
        ├── class_dir
        │   ├── class1.png
        │   ├── ...
        │   ├── class20.png
        '''
        for i, class_dir in enumerate(sorted(glob.glob(f'{data_dir}/*'))):
            images = sorted(glob.glob(f'{class_dir}/*'))
            self.images += images
            self.labels += ([i] * len(images))  # 第i个读到的类文件夹,类别就是i
            self.names += [os.path.relpath(imgs, data_dir) for imgs in images]  # 返回imgs相对于data_dir的相对路径
        self.transform = transform
    def __getitem__(self, idx):
        image = self.transform(Image.open(self.images[idx]))
        label = self.labels[idx]
        return image, label
    def __getname__(self):
        return self.names
    def __len__(self):
        return len(self.images)

adv_set = AdvDataset(root, transform=transform)
adv_names = adv_set.__getname__()
adv_loader = DataLoader(adv_set, batch_size=batch_size, shuffle=False)

print(f'number of images = {adv_set.__len__()}')

Proxy Model and Target Model

This assignment uses the trained model as the proxy network and the attack target model. These networks are pre-trained on CIFAR-10 and can be imported from Pytorchcv. The list of models is here . To select those models with _cifar10 suffix.

The target model is resnet110_cifar10. The latter proxy model chooses nin_cifar10, resnet20_cifar10, preresnet20_cifar10, that is to say, training and generating attack images on these networks is then applied to attack resnet110_cifar10

from pytorchcv.model_provider import get_model as ptcv_get_model

model = ptcv_get_model('resnet110_cifar10', pretrained=True).to(device)
loss_fn = nn.CrossEntropyLoss()

Evaluate the performance of the target model on non-aggressive images

def epoch_benign(model, loader, loss_fn):
    model.eval()
    train_acc, train_loss = 0.0, 0.0
    with torch.no_grad():
        for x, y in loader:
            x, y = x.to(device), y.to(device)
            yp = model(x)
            loss = loss_fn(yp, y)
            train_acc += (yp.argmax(dim=1) == y).sum().item()
            train_loss += loss.item() * x.shape[0]
    return train_acc / len(loader.dataset), train_loss / len(loader.dataset)

The accuracy of resnet110_cifar10 in the attacked picture is benign_acc=0.95, benign_loss=0.22678.

benign_acc, benign_loss = epoch_benign(model, adv_loader, loss_fn)
print(f'benign_acc = {benign_acc:.5f}, benign_loss = {benign_loss:.5f}')

Attack Algorithm

FGSM

Fast Gradient Sign Method (FGSM). FGSM performs only one attack on the image.

def fgsm(model, x, y, loss_fn, epsilon=epsilon):    
    x_adv = x.detach().clone() # 克隆x是因为x的值会随着x_adv的改变而改变
    x_adv.requires_grad = True # need to obtain gradient of x_adv, thus set required grad
    loss = loss_fn(model(x_adv), y) 
    loss.backward()    
    # fgsm: use gradient ascent on x_adv to maximize loss
    grad = x_adv.grad.detach() 
    x_adv = x_adv + epsilon * grad.sign()  # 不会越界,所以不用clip
    return x_adv

I-FGSM

Iterative Fast Gradient Sign Method (I-FGSM). Compared with fgsm, the ifgsm method uses multiple fgsm loop attacks, for which there is an additional parameter α

# set alpha as the step size in Global Settings section
# alpha and num_iter can be decided by yourself
alpha = 0.8/255/std
def ifgsm(model, x, y, loss_fn, epsilon=epsilon, alpha=alpha, num_iter=20):
    x_adv = x    
    for i in range(num_iter):
        # x_adv = fgsm(model, x_adv, y, loss_fn, alpha) # call fgsm with (epsilon = alpha) to obtain new x_adv        
        x_adv = x_adv.detach().clone()
        x_adv.requires_grad = True # need to obtain gradient of x_adv, thus set required grad
        loss = loss_fn(model(x_adv), y) 
        loss.backward()
        # fgsm: use gradient ascent on x_adv to maximize loss
        grad = x_adv.grad.detach()
        x_adv = x_adv + alpha * grad.sign()

        x_adv = torch.max(torch.min(x_adv, x+epsilon), x-epsilon) # clip new x_adv back to [x-epsilon, x+epsilon]
    return x_adv

MI-FGSM

https://arxiv.org/pdf/1710.06081.pdf

Compared with ifgsm, mifgsm adds momentum to prevent attacks from falling into local maxima (this is similar to the principle of momentum in optimizer)

def mifgsm(model, x, y, loss_fn, epsilon=epsilon, alpha=alpha, num_iter=20, decay=0.9):
    x_adv = x
    # initialze momentum tensor
    momentum = torch.zeros_like(x).detach().to(device)
    # write a loop of num_iter to represent the iterative times
    for i in range(num_iter):
        x_adv = x_adv.detach().clone()
        x_adv.requires_grad = True # need to obtain gradient of x_adv, thus set required grad
        loss = loss_fn(model(x_adv), y) # calculate loss
        loss.backward() # calculate gradient
        # Momentum calculation
        grad = x_adv.grad.detach() 
        grad = decay * momentum +  grad / (grad.abs().sum() + 1e-8)        
        momentum = grad
        x_adv = x_adv + alpha * grad.sign()
        x_adv = torch.max(torch.min(x_adv, x+epsilon), x-epsilon) # clip new x_adv back to [x-epsilon, x+epsilon]
    return x_adv

Diverse Input (DIM)

If the generated images are overfitted on the proxy model, the attack power of these images on the target model may decrease.

On the basis of mifgsm, dim-mifgsm adds transform to the attacked image to avoid overfitting. This technique comes from the article Improving Transferability of Adversarial Examples with Input Diversity (https://arxiv.org/pdf/1803.06978.pdf). The transform in the article is to randomly resize the image first, and then randomly padding the image to the original size

def dmi_mifgsm(model, x, y, loss_fn, epsilon=epsilon, alpha=alpha, num_iter=50, decay=0.9, p=0.5):
    x_adv = x
    # initialze momentum tensor
    momentum = torch.zeros_like(x).detach().to(device)
    # write a loop of num_iter to represent the iterative times
    for i in range(num_iter):
        x_adv = x_adv.detach().clone()
        x_adv_raw = x_adv.clone()
        if torch.rand(1).item() >= p:  # 以一定几率进行数据增广
            #resize img to rnd X rnd
            rnd = torch.randint(29, 33, (1,)).item()
            x_adv = transforms.Resize((rnd, rnd))(x_adv)
            #padding img to 32 X 32 with 0
            left = torch.randint(0, 32 - rnd + 1, (1,)).item()
            top = torch.randint(0, 32 - rnd + 1, (1,)).item()
            right = 32 - rnd - left
            bottom = 32 - rnd - top
            x_adv = transforms.Pad([left, top, right, bottom])(x_adv)
        x_adv.requires_grad = True # need to obtain gradient of x_adv, thus set required grad
        loss = loss_fn(model(x_adv), y)
        loss.backward() 
        # Momentum calculation        
        grad = x_adv.grad.detach()
        grad = decay * momentum + grad/(grad.abs().sum() + 1e-8)
        momentum = grad
        x_adv = x_adv_raw + alpha * grad.sign()
        x_adv = torch.max(torch.min(x_adv, x+epsilon), x-epsilon) # clip new x_adv back to [x-epsilon, x+epsilon]
    return x_adv

attack function

generate attack image function

Use a function gen_adv_examples to call the attack algorithm, generate the attack image, and calculate the attack effect (the accuracy of the proxy model to identify the attack image).

The transformed image pixels are located at [0-1], and the channel has also changed. In order to generate an attack image, an inverse operation is required. The code here is textbook level

# perform adversarial attack and generate adversarial examples
def gen_adv_examples(model, loader, attack, loss_fn):
    model.eval()
    adv_names = []
    train_acc, train_loss = 0.0, 0.0
    for i, (x, y) in enumerate(loader):
        x, y = x.to(device), y.to(device)
        x_adv = attack(model, x, y, loss_fn) # obtain adversarial examples
        yp = model(x_adv)
        loss = loss_fn(yp, y)
        _, pred = torch.max(yp, 1)        
        train_acc += (pred == y.detach()).sum().item()
        train_loss += loss.item() * x.shape[0]
        # store adversarial examples
        adv_ex = ((x_adv) * std + mean).clamp(0, 1) # to 0-1 scale
        adv_ex = (adv_ex * 255).clamp(0, 255) # 0-255 scale
        adv_ex = adv_ex.detach().cpu().data.numpy().round() # round to remove decimal part
        adv_ex = adv_ex.transpose((0, 2, 3, 1)) # transpose (bs, C, H, W) back to (bs, H, W, C)
        adv_examples = adv_ex if i == 0 else np.r_[adv_examples, adv_ex]
    return adv_examples, train_acc / len(loader.dataset), train_loss / len(loader.dataset)

# create directory which stores adversarial examples
def create_dir(data_dir, adv_dir, adv_examples, adv_names):
    if os.path.exists(adv_dir) is not True:
        _ = shutil.copytree(data_dir, adv_dir)
    for example, name in zip(adv_examples, adv_names):
        im = Image.fromarray(example.astype(np.uint8)) # image pixel value should be unsigned int
        im.save(os.path.join(adv_dir, name))

Ensemble Attack

Simultaneous attack on multiple proxy models. Refer to Delving into Transferable Adversarial Examples and Black-box Attacks

ModuleList receives a list of submodules (or layers, which must belong to the nn.Module class) as input, and can perform append and extend operations similar to List. At the same time, the weights of sub-modules or layers are automatically added to the network. It is important to note that nn.ModuleList does not define a network, it just stores different modules together. The order of the elements in the ModuleList does not represent its real position order in the network, and the definition of the model is completed only after specifying the order of each layer through the forward function

Integrated model function

class ensembleNet(nn.Module):
    def __init__(self, model_names):
        super().__init__()
        # ModuleList 接收一个子模块(或层,需属于nn.Module类)的列表作为输入,可以类似List那样进行append和extend操作
        self.models = nn.ModuleList([ptcv_get_model(name, pretrained=True) for name in model_names])
        # self.models.append(undertrain_resnet18) 可以append自己训练的代理网络
        
    def forward(self, x):
        emsemble_logits = None
        # sum up logits from multiple models  
        for i, m in enumerate(self.models):
            emsemble_logits = m(x) if i == 0 else emsemble_logits + m(x)        
        return emsemble_logits/len(self.models)

Build an ensemble model

agent model

model_names = [
    'nin_cifar10',
    'resnet20_cifar10',
    'preresnet20_cifar10'
]
ensemble_model = ensembleNet(model_names).to(device)
ensemble_model.eval()

Visualize attack results

Attack images are generated and saved for each attack. Change the attack image folder path below to read the attack image, pass it to the target network, and visualize the attack effect

classes = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']    
def show_attck(adv_dir, classes=classes):    
    plt.figure(figsize=(10, 20))
    cnt = 0
    for i, cls_name in enumerate(classes):
        path = f'{cls_name}/{cls_name}1.png'
        # benign image
        cnt += 1
        plt.subplot(len(classes), 4, cnt)
        im = Image.open(os.path.join(adv_dir, path))
        logit = model(transform(im).unsqueeze(0).to(device))[0]
        predict = logit.argmax(-1).item()
        prob = logit.softmax(-1)[predict].item()
        plt.title(f'benign: {cls_name}1.png\n{classes[predict]}: {prob:.2%}')
        plt.axis('off')
        plt.imshow(np.array(im))
        # adversarial image
        cnt += 1
        plt.subplot(len(classes), 4, cnt)
        im = Image.open(os.path.join(root, path))
        logit = model(transform(im).unsqueeze(0).to(device))[0]
        predict = logit.argmax(-1).item()
        prob = logit.softmax(-1)[predict].item()
        plt.title(f'adversarial: {cls_name}1.png\n{classes[predict]}: {prob:.2%}')
        plt.axis('off')
        plt.imshow(np.array(im))
    plt.tight_layout()
    plt.show()

attack

FGSM method

adv_examples, ifgsm_acc, ifgsm_loss = gen_adv_examples(ensemble_model, adv_loader, ifgsm, loss_fn)
print(f'ensemble_ifgsm_acc = {ifgsm_acc:.5f}, ensemble_ifgsm_loss = {ifgsm_loss:.5f}')

adv_dir = 'ifgsm'
create_dir(root, adv_dir, adv_examples, adv_names)
show_attck(adv_dir)
fgsm_acc = 0.59000, fgsm_loss = 2.49304

The original recognition performance of the target network is benign_acc = 0.95000, benign_loss = 0.22678, passed the Simple Baseline

Look at the attack effect on the target network resnet110_cifar10 (using the previous visualization code), there are some successes and some failures. white box attack

I-FGSM method + Ensembel Attack

First observe the accuracy of the integrated model in the benign image

from pytorchcv.model_provider import get_model as ptcv_get_model

benign_acc, benign_loss = epoch_benign(ensemble_model, adv_loader, loss_fn)
print(f'benign_acc = {benign_acc:.5f}, benign_loss = {benign_loss:.5f}')
benign_acc = 0.95000, benign_loss = 0.15440 

attack

adv_examples, ifgsm_acc, ifgsm_loss = gen_adv_examples(ensemble_model, adv_loader, ifgsm, loss_fn)
print(f'ensemble_ifgsm_acc = {ifgsm_acc:.5f}, ensemble_ifgsm_loss = {ifgsm_loss:.5f}')

adv_dir = 'ensemble_ifgsm'
create_dir(root, adv_dir, adv_examples, adv_names)
show_attck(adv_dir)
ensemble_ifgsm_acc = 0.00000, ensemble_ifgsm_loss = 13.41135

Passed the Medium Baseline (acc <= 0.50). Take a look at the attack effect on the target network resnet110_cifar10 (use the following visualization code)

MIFGSM + Ensemble Attack(pick right models)

According to Li Hongyi's 2022 Machine Learning HW10 Analysis_Machine Learning Craftsman's Blog-CSDN Blog , in the medium baseline, some proxy models were randomly selected, which is very blind. According to the article Query-Free Adversarial Transfer via Undertrained Surrogates (https:// arxiv.org/abs/2007.00806) description, you can choose some models with insufficient training. The meaning of insufficient training includes two aspects: one is that the training epoch of the model is less, and the other is that the model has not reached the minimum loss in the verification set (val set). . According to an example in the paper, use the training method in https://github.com/kuangliu/pytorch-cifar, select the resnet18 model, and train for 30 epochs (it takes about 200 epochs for normal training to reach the best result), and convert it to Join ensmbleNet. (This insufficiently trained model is not done below)

adv_examples, ifgsm_acc, ifgsm_loss = gen_adv_examples(ensemble_model, adv_loader, mifgsm, loss_fn)
print(f'ensemble_mifgsm_acc = {ifgsm_acc:.5f}, ensemble_mifgsm_loss = {ifgsm_loss:.5f}')

adv_dir = 'ensemble_mifgsm'
create_dir(root, adv_dir, adv_examples, adv_names)
show_attck(adv_dir)
ensemble_mifgsm_acc = 0.00500, ensemble_mifgsm_loss = 13.23710

Take a look at the attack effect on the target network resnet110_cifar10 (use the following visualization code)

DIM-MIFGSM + Ensemble Attack(pick right models)

adv_examples, ifgsm_acc, ifgsm_loss = gen_adv_examples(ensemble_model, adv_loader, dmi_mifgsm, loss_fn)
print(f'ensemble_dmi_mifgsm_acc = {ifgsm_acc:.5f}, ensemble_dim_mifgsm_loss = {ifgsm_loss:.5f}')

adv_dir = 'ensemble_dmi_mifgsm'
create_dir(root, adv_dir, adv_examples, adv_names)
show_attck(adv_dir)
ensemble_dmi_mifgsm_acc = 0.00000, ensemble_dim_mifgsm_loss = 15.16159

Take a look at the attack effect on the target network resnet110_cifar10 (use the following visualization code)

Passive Defense—JPEG Compression

JPEG compression by imgaug package, compression rate set to 70

Reference: imgaug.augmenters.arithmetic — imgaug 0.4.0 documentation

attack

# original image
path = f'dog/dog2.png'
im = Image.open(f'./data/{path}')
logit = model(transform(im).unsqueeze(0).to(device))[0]
predict = logit.argmax(-1).item()
prob = logit.softmax(-1)[predict].item()
plt.title(f'benign: dog2.png\n{classes[predict]}: {prob:.2%}')
plt.axis('off')
plt.imshow(np.array(im))
plt.tight_layout()
plt.show()

# adversarial image 
adv_im = Image.open(f'./ensemble_dmi_mifgsm/{path}')
logit = model(transform(adv_im).unsqueeze(0).to(device))[0]
predict = logit.argmax(-1).item()
prob = logit.softmax(-1)[predict].item()
plt.title(f'adversarial: dog2.png\n{classes[predict]}: {prob:.2%}')
plt.axis('off')
plt.imshow(np.array(adv_im))
plt.tight_layout()
plt.show()

 

defense

import imgaug.augmenters as iaa

# pre-process image
x = transforms.ToTensor()(adv_im)*255
x = x.permute(1, 2, 0).numpy()
x = x.astype(np.uint8)

# TODO: use "imgaug" package to perform JPEG compression (compression rate = 70)
compressed_x =  iaa.arithmetic.compress_jpeg(x, compression=70)

logit = model(transform(compressed_x).unsqueeze(0).to(device))[0]
predict = logit.argmax(-1).item()
prob = logit.softmax(-1)[predict].item()
plt.title(f'JPEG adversarial: dog2.png\n{classes[predict]}: {prob:.2%}')
plt.axis('off')


plt.imshow(compressed_x)
plt.tight_layout()
plt.show()

defense succeeded

Extension: file reading

The handwritten dataset function of the original code is worth studying. First read all the files in the root folder, sort them, and return a list variable

>>dir_list = sorted(glob.glob(f'{root}/*'))
>>print(dir_list)

['./data\\airplane', './data\\automobile', './data\\bird', './data\\cat', './data\\deer', './data\\dog', './data\\frog', './data\\horse', './data\\ship', './data\\truck']

Read the first folder in the list variable and take out the first file name. These filenames can be used in the Image.open function

>>images = sorted(glob.glob(f'{dir_list[0]}/*'))
>>print(images[0])

./data\airplane\airplane1.png

remove relative path

>>print(os.path.relpath(images[0], root))

airplane\airplane1.png

Guess you like

Origin blog.csdn.net/iwill323/article/details/128031965