For the theoretical part, see Li Hongyi's Machine Learning - Adversarial Attack_iwill323's Blog - CSDN Blog
Table of contents
Evaluate the performance of the target model on non-aggressive images
generate attack image function
I-FGSM method + Ensembel Attack
MIFGSM + Ensemble Attack(pick right models)
DIM-MIFGSM + Ensemble Attack(pick right models)
Passive Defense—JPEG Compression
goals and methods
Use the training data of the target network to train one or some proxy networks (this job does not need to be trained, just use a trained model), treat the proxy network as the target of attack, and use the proxy network to generate offensive input. It is a white box attack on the proxy network, and then input the trained picture into the Network whose parameters are unknown, and the attack is realized.
○ Attack objective: Non-targeted attack
○ Attack algorithm: FGSM/I-FGSM
○ Attack schema: Black box attack (perform attack on proxy network)
○ Increase attack transferability by Diverse input (DIM)
○ Attack more than one proxy model - Ensemble attack
If you are not a student of National Taiwan University for this assignment, you will not be able to see your submission results and actual scores
Evaluation method
The pixel value of the image is 0-255. In this job, the maximum pixel size ε to be changed is limited to 8, so that the change of the image is not too obvious. If ε is equal to 16, then the image change is more obvious
○ ε is fixed to 8
○ Distance measurement: L-inf. norm
○ Model accuracy (decline) is the only evaluation criterion
Guide package
import torch
import torch.nn as nn
import torchvision
import os
import glob
import shutil
import numpy as np
from PIL import Image
from torchvision.transforms import transforms
from torch.utils.data import Dataset, DataLoader
import matplotlib.pyplot as plt
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
batch_size = 8
Global Settings
Mainly the mean mean and standard deviation std used for image standardization, and ε. ε is to be divided by 255 and std, explained as follows
benign images: images which do not contain adversarial perturbations
adversarial images: images which include adversarial perturbations
# the mean and std are the calculated statistics from cifar_10 dataset
cifar_10_mean = (0.491, 0.482, 0.447) # mean for the three channels of cifar_10 images
cifar_10_std = (0.202, 0.199, 0.201) # std for the three channels of cifar_10 images
# convert mean and std to 3-dimensional tensors for future operations
mean = torch.tensor(cifar_10_mean).to(device).view(3, 1, 1)
std = torch.tensor(cifar_10_std).to(device).view(3, 1, 1)
epsilon = 8/255/std
root = './data' # directory for storing benign images
Data
transform
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(cifar_10_mean, cifar_10_std)
])
Dataset
It can be downloaded from Li Hongyi's 2022 Machine Learning HW10 Analysis_Machine Learning Craftsman's Blog-CSDN Blog , a total of 200 pictures, divided into 10 folders, and each category has 20 pictures.
data_dir
├── class_dir
│ ├── class1.png
│ ├── ...
│ ├── class20.png
Seeing this directory structure, you can find that you can use the ImageFolder function. ImageFolder function can refer to data set reading and division_iwill323's blog-CSDN blog
adv_set = torchvision.datasets.ImageFolder(os.path.join(root), transform=transform)
adv_loader = DataLoader(adv_set, batch_size=batch_size, shuffle=False)
What's interesting is that the original code customizes a Dataset function, which is short and concise and worth learning
class AdvDataset(Dataset):
def __init__(self, data_dir, transform):
self.images = []
self.labels = []
self.names = []
'''
data_dir
├── class_dir
│ ├── class1.png
│ ├── ...
│ ├── class20.png
'''
for i, class_dir in enumerate(sorted(glob.glob(f'{data_dir}/*'))):
images = sorted(glob.glob(f'{class_dir}/*'))
self.images += images
self.labels += ([i] * len(images)) # 第i个读到的类文件夹,类别就是i
self.names += [os.path.relpath(imgs, data_dir) for imgs in images] # 返回imgs相对于data_dir的相对路径
self.transform = transform
def __getitem__(self, idx):
image = self.transform(Image.open(self.images[idx]))
label = self.labels[idx]
return image, label
def __getname__(self):
return self.names
def __len__(self):
return len(self.images)
adv_set = AdvDataset(root, transform=transform)
adv_names = adv_set.__getname__()
adv_loader = DataLoader(adv_set, batch_size=batch_size, shuffle=False)
print(f'number of images = {adv_set.__len__()}')
Proxy Model and Target Model
This assignment uses the trained model as the proxy network and the attack target model. These networks are pre-trained on CIFAR-10 and can be imported from Pytorchcv. The list of models is here . To select those models with _cifar10 suffix.
The target model is resnet110_cifar10. The latter proxy model chooses nin_cifar10, resnet20_cifar10, preresnet20_cifar10, that is to say, training and generating attack images on these networks is then applied to attack resnet110_cifar10
from pytorchcv.model_provider import get_model as ptcv_get_model
model = ptcv_get_model('resnet110_cifar10', pretrained=True).to(device)
loss_fn = nn.CrossEntropyLoss()
Evaluate the performance of the target model on non-aggressive images
def epoch_benign(model, loader, loss_fn):
model.eval()
train_acc, train_loss = 0.0, 0.0
with torch.no_grad():
for x, y in loader:
x, y = x.to(device), y.to(device)
yp = model(x)
loss = loss_fn(yp, y)
train_acc += (yp.argmax(dim=1) == y).sum().item()
train_loss += loss.item() * x.shape[0]
return train_acc / len(loader.dataset), train_loss / len(loader.dataset)
The accuracy of resnet110_cifar10 in the attacked picture is benign_acc=0.95, benign_loss=0.22678.
benign_acc, benign_loss = epoch_benign(model, adv_loader, loss_fn)
print(f'benign_acc = {benign_acc:.5f}, benign_loss = {benign_loss:.5f}')
Attack Algorithm
FGSM
Fast Gradient Sign Method (FGSM). FGSM performs only one attack on the image.
def fgsm(model, x, y, loss_fn, epsilon=epsilon):
x_adv = x.detach().clone() # 克隆x是因为x的值会随着x_adv的改变而改变
x_adv.requires_grad = True # need to obtain gradient of x_adv, thus set required grad
loss = loss_fn(model(x_adv), y)
loss.backward()
# fgsm: use gradient ascent on x_adv to maximize loss
grad = x_adv.grad.detach()
x_adv = x_adv + epsilon * grad.sign() # 不会越界,所以不用clip
return x_adv
I-FGSM
Iterative Fast Gradient Sign Method (I-FGSM). Compared with fgsm, the ifgsm method uses multiple fgsm loop attacks, for which there is an additional parameter α
# set alpha as the step size in Global Settings section
# alpha and num_iter can be decided by yourself
alpha = 0.8/255/std
def ifgsm(model, x, y, loss_fn, epsilon=epsilon, alpha=alpha, num_iter=20):
x_adv = x
for i in range(num_iter):
# x_adv = fgsm(model, x_adv, y, loss_fn, alpha) # call fgsm with (epsilon = alpha) to obtain new x_adv
x_adv = x_adv.detach().clone()
x_adv.requires_grad = True # need to obtain gradient of x_adv, thus set required grad
loss = loss_fn(model(x_adv), y)
loss.backward()
# fgsm: use gradient ascent on x_adv to maximize loss
grad = x_adv.grad.detach()
x_adv = x_adv + alpha * grad.sign()
x_adv = torch.max(torch.min(x_adv, x+epsilon), x-epsilon) # clip new x_adv back to [x-epsilon, x+epsilon]
return x_adv
MI-FGSM
https://arxiv.org/pdf/1710.06081.pdf
Compared with ifgsm, mifgsm adds momentum to prevent attacks from falling into local maxima (this is similar to the principle of momentum in optimizer)
def mifgsm(model, x, y, loss_fn, epsilon=epsilon, alpha=alpha, num_iter=20, decay=0.9):
x_adv = x
# initialze momentum tensor
momentum = torch.zeros_like(x).detach().to(device)
# write a loop of num_iter to represent the iterative times
for i in range(num_iter):
x_adv = x_adv.detach().clone()
x_adv.requires_grad = True # need to obtain gradient of x_adv, thus set required grad
loss = loss_fn(model(x_adv), y) # calculate loss
loss.backward() # calculate gradient
# Momentum calculation
grad = x_adv.grad.detach()
grad = decay * momentum + grad / (grad.abs().sum() + 1e-8)
momentum = grad
x_adv = x_adv + alpha * grad.sign()
x_adv = torch.max(torch.min(x_adv, x+epsilon), x-epsilon) # clip new x_adv back to [x-epsilon, x+epsilon]
return x_adv
Diverse Input (DIM)
If the generated images are overfitted on the proxy model, the attack power of these images on the target model may decrease.
On the basis of mifgsm, dim-mifgsm adds transform to the attacked image to avoid overfitting. This technique comes from the article Improving Transferability of Adversarial Examples with Input Diversity (https://arxiv.org/pdf/1803.06978.pdf). The transform in the article is to randomly resize the image first, and then randomly padding the image to the original size
def dmi_mifgsm(model, x, y, loss_fn, epsilon=epsilon, alpha=alpha, num_iter=50, decay=0.9, p=0.5):
x_adv = x
# initialze momentum tensor
momentum = torch.zeros_like(x).detach().to(device)
# write a loop of num_iter to represent the iterative times
for i in range(num_iter):
x_adv = x_adv.detach().clone()
x_adv_raw = x_adv.clone()
if torch.rand(1).item() >= p: # 以一定几率进行数据增广
#resize img to rnd X rnd
rnd = torch.randint(29, 33, (1,)).item()
x_adv = transforms.Resize((rnd, rnd))(x_adv)
#padding img to 32 X 32 with 0
left = torch.randint(0, 32 - rnd + 1, (1,)).item()
top = torch.randint(0, 32 - rnd + 1, (1,)).item()
right = 32 - rnd - left
bottom = 32 - rnd - top
x_adv = transforms.Pad([left, top, right, bottom])(x_adv)
x_adv.requires_grad = True # need to obtain gradient of x_adv, thus set required grad
loss = loss_fn(model(x_adv), y)
loss.backward()
# Momentum calculation
grad = x_adv.grad.detach()
grad = decay * momentum + grad/(grad.abs().sum() + 1e-8)
momentum = grad
x_adv = x_adv_raw + alpha * grad.sign()
x_adv = torch.max(torch.min(x_adv, x+epsilon), x-epsilon) # clip new x_adv back to [x-epsilon, x+epsilon]
return x_adv
attack function
generate attack image function
Use a function gen_adv_examples to call the attack algorithm, generate the attack image, and calculate the attack effect (the accuracy of the proxy model to identify the attack image).
The transformed image pixels are located at [0-1], and the channel has also changed. In order to generate an attack image, an inverse operation is required. The code here is textbook level
# perform adversarial attack and generate adversarial examples
def gen_adv_examples(model, loader, attack, loss_fn):
model.eval()
adv_names = []
train_acc, train_loss = 0.0, 0.0
for i, (x, y) in enumerate(loader):
x, y = x.to(device), y.to(device)
x_adv = attack(model, x, y, loss_fn) # obtain adversarial examples
yp = model(x_adv)
loss = loss_fn(yp, y)
_, pred = torch.max(yp, 1)
train_acc += (pred == y.detach()).sum().item()
train_loss += loss.item() * x.shape[0]
# store adversarial examples
adv_ex = ((x_adv) * std + mean).clamp(0, 1) # to 0-1 scale
adv_ex = (adv_ex * 255).clamp(0, 255) # 0-255 scale
adv_ex = adv_ex.detach().cpu().data.numpy().round() # round to remove decimal part
adv_ex = adv_ex.transpose((0, 2, 3, 1)) # transpose (bs, C, H, W) back to (bs, H, W, C)
adv_examples = adv_ex if i == 0 else np.r_[adv_examples, adv_ex]
return adv_examples, train_acc / len(loader.dataset), train_loss / len(loader.dataset)
# create directory which stores adversarial examples
def create_dir(data_dir, adv_dir, adv_examples, adv_names):
if os.path.exists(adv_dir) is not True:
_ = shutil.copytree(data_dir, adv_dir)
for example, name in zip(adv_examples, adv_names):
im = Image.fromarray(example.astype(np.uint8)) # image pixel value should be unsigned int
im.save(os.path.join(adv_dir, name))
Ensemble Attack
Simultaneous attack on multiple proxy models. Refer to Delving into Transferable Adversarial Examples and Black-box Attacks
ModuleList receives a list of submodules (or layers, which must belong to the nn.Module class) as input, and can perform append and extend operations similar to List. At the same time, the weights of sub-modules or layers are automatically added to the network. It is important to note that nn.ModuleList does not define a network, it just stores different modules together. The order of the elements in the ModuleList does not represent its real position order in the network, and the definition of the model is completed only after specifying the order of each layer through the forward function
Integrated model function
class ensembleNet(nn.Module):
def __init__(self, model_names):
super().__init__()
# ModuleList 接收一个子模块(或层,需属于nn.Module类)的列表作为输入,可以类似List那样进行append和extend操作
self.models = nn.ModuleList([ptcv_get_model(name, pretrained=True) for name in model_names])
# self.models.append(undertrain_resnet18) 可以append自己训练的代理网络
def forward(self, x):
emsemble_logits = None
# sum up logits from multiple models
for i, m in enumerate(self.models):
emsemble_logits = m(x) if i == 0 else emsemble_logits + m(x)
return emsemble_logits/len(self.models)
Build an ensemble model
agent model
model_names = [
'nin_cifar10',
'resnet20_cifar10',
'preresnet20_cifar10'
]
ensemble_model = ensembleNet(model_names).to(device)
ensemble_model.eval()
Visualize attack results
Attack images are generated and saved for each attack. Change the attack image folder path below to read the attack image, pass it to the target network, and visualize the attack effect
classes = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
def show_attck(adv_dir, classes=classes):
plt.figure(figsize=(10, 20))
cnt = 0
for i, cls_name in enumerate(classes):
path = f'{cls_name}/{cls_name}1.png'
# benign image
cnt += 1
plt.subplot(len(classes), 4, cnt)
im = Image.open(os.path.join(adv_dir, path))
logit = model(transform(im).unsqueeze(0).to(device))[0]
predict = logit.argmax(-1).item()
prob = logit.softmax(-1)[predict].item()
plt.title(f'benign: {cls_name}1.png\n{classes[predict]}: {prob:.2%}')
plt.axis('off')
plt.imshow(np.array(im))
# adversarial image
cnt += 1
plt.subplot(len(classes), 4, cnt)
im = Image.open(os.path.join(root, path))
logit = model(transform(im).unsqueeze(0).to(device))[0]
predict = logit.argmax(-1).item()
prob = logit.softmax(-1)[predict].item()
plt.title(f'adversarial: {cls_name}1.png\n{classes[predict]}: {prob:.2%}')
plt.axis('off')
plt.imshow(np.array(im))
plt.tight_layout()
plt.show()
attack
FGSM method
adv_examples, ifgsm_acc, ifgsm_loss = gen_adv_examples(ensemble_model, adv_loader, ifgsm, loss_fn)
print(f'ensemble_ifgsm_acc = {ifgsm_acc:.5f}, ensemble_ifgsm_loss = {ifgsm_loss:.5f}')
adv_dir = 'ifgsm'
create_dir(root, adv_dir, adv_examples, adv_names)
show_attck(adv_dir)
fgsm_acc = 0.59000, fgsm_loss = 2.49304
The original recognition performance of the target network is benign_acc = 0.95000, benign_loss = 0.22678, passed the Simple Baseline
Look at the attack effect on the target network resnet110_cifar10 (using the previous visualization code), there are some successes and some failures. white box attack
I-FGSM method + Ensembel Attack
First observe the accuracy of the integrated model in the benign image
from pytorchcv.model_provider import get_model as ptcv_get_model
benign_acc, benign_loss = epoch_benign(ensemble_model, adv_loader, loss_fn)
print(f'benign_acc = {benign_acc:.5f}, benign_loss = {benign_loss:.5f}')
benign_acc = 0.95000, benign_loss = 0.15440
attack
adv_examples, ifgsm_acc, ifgsm_loss = gen_adv_examples(ensemble_model, adv_loader, ifgsm, loss_fn)
print(f'ensemble_ifgsm_acc = {ifgsm_acc:.5f}, ensemble_ifgsm_loss = {ifgsm_loss:.5f}')
adv_dir = 'ensemble_ifgsm'
create_dir(root, adv_dir, adv_examples, adv_names)
show_attck(adv_dir)
ensemble_ifgsm_acc = 0.00000, ensemble_ifgsm_loss = 13.41135
Passed the Medium Baseline (acc <= 0.50). Take a look at the attack effect on the target network resnet110_cifar10 (use the following visualization code)
MIFGSM + Ensemble Attack(pick right models)
According to Li Hongyi's 2022 Machine Learning HW10 Analysis_Machine Learning Craftsman's Blog-CSDN Blog , in the medium baseline, some proxy models were randomly selected, which is very blind. According to the article Query-Free Adversarial Transfer via Undertrained Surrogates (https:// arxiv.org/abs/2007.00806) description, you can choose some models with insufficient training. The meaning of insufficient training includes two aspects: one is that the training epoch of the model is less, and the other is that the model has not reached the minimum loss in the verification set (val set). . According to an example in the paper, use the training method in https://github.com/kuangliu/pytorch-cifar, select the resnet18 model, and train for 30 epochs (it takes about 200 epochs for normal training to reach the best result), and convert it to Join ensmbleNet. (This insufficiently trained model is not done below)
adv_examples, ifgsm_acc, ifgsm_loss = gen_adv_examples(ensemble_model, adv_loader, mifgsm, loss_fn)
print(f'ensemble_mifgsm_acc = {ifgsm_acc:.5f}, ensemble_mifgsm_loss = {ifgsm_loss:.5f}')
adv_dir = 'ensemble_mifgsm'
create_dir(root, adv_dir, adv_examples, adv_names)
show_attck(adv_dir)
ensemble_mifgsm_acc = 0.00500, ensemble_mifgsm_loss = 13.23710
Take a look at the attack effect on the target network resnet110_cifar10 (use the following visualization code)
DIM-MIFGSM + Ensemble Attack(pick right models)
adv_examples, ifgsm_acc, ifgsm_loss = gen_adv_examples(ensemble_model, adv_loader, dmi_mifgsm, loss_fn)
print(f'ensemble_dmi_mifgsm_acc = {ifgsm_acc:.5f}, ensemble_dim_mifgsm_loss = {ifgsm_loss:.5f}')
adv_dir = 'ensemble_dmi_mifgsm'
create_dir(root, adv_dir, adv_examples, adv_names)
show_attck(adv_dir)
ensemble_dmi_mifgsm_acc = 0.00000, ensemble_dim_mifgsm_loss = 15.16159
Take a look at the attack effect on the target network resnet110_cifar10 (use the following visualization code)
Passive Defense—JPEG Compression
JPEG compression by imgaug package, compression rate set to 70
Reference: imgaug.augmenters.arithmetic — imgaug 0.4.0 documentation
attack
# original image
path = f'dog/dog2.png'
im = Image.open(f'./data/{path}')
logit = model(transform(im).unsqueeze(0).to(device))[0]
predict = logit.argmax(-1).item()
prob = logit.softmax(-1)[predict].item()
plt.title(f'benign: dog2.png\n{classes[predict]}: {prob:.2%}')
plt.axis('off')
plt.imshow(np.array(im))
plt.tight_layout()
plt.show()
# adversarial image
adv_im = Image.open(f'./ensemble_dmi_mifgsm/{path}')
logit = model(transform(adv_im).unsqueeze(0).to(device))[0]
predict = logit.argmax(-1).item()
prob = logit.softmax(-1)[predict].item()
plt.title(f'adversarial: dog2.png\n{classes[predict]}: {prob:.2%}')
plt.axis('off')
plt.imshow(np.array(adv_im))
plt.tight_layout()
plt.show()
defense
import imgaug.augmenters as iaa
# pre-process image
x = transforms.ToTensor()(adv_im)*255
x = x.permute(1, 2, 0).numpy()
x = x.astype(np.uint8)
# TODO: use "imgaug" package to perform JPEG compression (compression rate = 70)
compressed_x = iaa.arithmetic.compress_jpeg(x, compression=70)
logit = model(transform(compressed_x).unsqueeze(0).to(device))[0]
predict = logit.argmax(-1).item()
prob = logit.softmax(-1)[predict].item()
plt.title(f'JPEG adversarial: dog2.png\n{classes[predict]}: {prob:.2%}')
plt.axis('off')
plt.imshow(compressed_x)
plt.tight_layout()
plt.show()
defense succeeded
Extension: file reading
The handwritten dataset function of the original code is worth studying. First read all the files in the root folder, sort them, and return a list variable
>>dir_list = sorted(glob.glob(f'{root}/*'))
>>print(dir_list)
['./data\\airplane', './data\\automobile', './data\\bird', './data\\cat', './data\\deer', './data\\dog', './data\\frog', './data\\horse', './data\\ship', './data\\truck']
Read the first folder in the list variable and take out the first file name. These filenames can be used in the Image.open function
>>images = sorted(glob.glob(f'{dir_list[0]}/*'))
>>print(images[0])
./data\airplane\airplane1.png
remove relative path
>>print(os.path.relpath(images[0], root))
airplane\airplane1.png