有关对抗攻击的论文整理

对抗攻击

部分引用，如有侵权，告知删除。
个人理解，如有不对，欢迎指正。

对抗攻击的概念

通过故意对数据集中输入样本添加难以察觉的扰动使模型给出一个错误的输出。

对抗攻击原理

《Intriguing properties of neural networks》（2013） [1]
论文源址：http://de.arxiv.org/pdf/1312.6199
参考笔记：https://zhuanlan.zhihu.com/p/217683614?utm_source=qq
论文贡献：
①神经网络中携带语义信息的不是某单个神经元，而是整个网络（或者说那一层）所表示的空间
②深度神经网络模型的非线性导致的输入与输出映射的不连续性，加上不充分的模型平均和不充分的正则化导致的过拟合使得对抗攻击成为可能。
③采用了L-BFGS生成对抗样本的一个近似值
④提出将对抗样本加入训练集能提高模型鲁棒性
《Explaining And Harnessing Adversarial Examples》 (2014) [2]
论文源址：http://de.arxiv.org/pdf/1412.6572
参考笔记：https://zhuanlan.zhihu.com/p/33875223
论文贡献：
①高维空间中的线性就足以造成对抗样本，深度模型对对抗样本的脆弱性最主要的还是由于其线性部分的存在。通过将模型转变成非线性的RBF模型，能减少神经网络模型对对抗攻击的脆弱性。
②提出快速梯度符号法(FGSM)生成对抗样本
③基于快速梯度符号法对目标函数进行改进可以起到很好的正则化效果

对抗攻击方法

基于目标模型损失函数

L-BFGS
相关论文：《Intriguing properties of neural networks》（2013） [1]
参考笔记：https://zhuanlan.zhihu.com/p/217683614?utm_source=qq
算法核心：通过拟牛顿优化算法L-BFGS 在盒约束下最小化扰动r 和加入扰动r后在攻击的目标分类l上的损失函数生成近似最小扰动
FGSM ——> IFGSM(也叫BIM) 、ILCM ——> R+FGSM
相关论文： FGSM : 《Explaining And Harnessing Adversarial Examples》 (2014) [2]
IFGSM(也叫BIM) 、ILCM : 《Adversarial examples in the physical world》（2016）
R+FGSM:《Ensemble Adversarial Training: Attacks and Defenses》 (2017)
论文源址： FGSM : http://de.arxiv.org/pdf/1412.6572
IFGSM(也叫BIM) 、ILCM ：http://arxiv.org/pdf/1607.02533
参考笔记： FGSM : https://zhuanlan.zhihu.com/p/33875223：
IFGSM(也叫BIM) 、ILCM：https://www.jianshu.com/p/2f3b15617236
我的笔记：快速梯度符号法（FGSM）及其改进

算法核心：
FGSM:在损失函数增大的梯度方向sign(▽J(θ,x,y))上增加一定的(由ε控制）扰动ŋ
在这里插入图片描述
IFGSM：分多步迭代生成对抗样本，攻击效果更好，扰动更小

R+FGSM:作者通过用单步攻击的输出替换内置最大化问题的解来近似方程式,生成对抗样本的计算开销更小，可以拓展到大型数据集用于对抗训练

基于目标模型网络结构

JSMA（Jacobian Saliency Map Attack）
相关论文：《The limitations of deep learning in adversarial settings》（2015）
论文源址：http://lanl.arxiv.org/pdf/1511.07528.pdf
参考笔记：https://blog.csdn.net/qq_36415775/article/details/89205794
相关代码：https://github.com/gongzhitaao/tensorflow-adversarial/tree/master/example
算法核心：利用雅可比显著图，迭代地对网络结构前向导数最大的像素点添加扰动（导数越大，y值变化越大）
DeepFool ——> Universal Adversarial Perturbations
相关论文：《DeepFool: a simple and accurate method to fool deep neural networks》（2016_CVPR）
《Universal Adversarial Perturbations》（2017_IEEE）
论文源址： DeepFool： https://www.cv-foundation.org/openaccess/content_cvpr_2016/app/S12-10.pdf
Universal Adversarial Perturbations ： https://arxiv.org/pdf/1610.08401v3.pdf
参考笔记：https://www.dazhuanlan.com/2019/12/09/5dee1b61a6844/
相关代码：https://github.com/LTS4/universal
算法核心：
DeepFool：通过计算样本离分类超平面的最小距离生成最小扰动

Universal Adversarial Perturbations：通过多次累加扰动，找到普适性（可以扰动多种图片）更强的扰动
One Pixel Attack
相关论文：《One Pixel Attack for Fooling Deep Neural Networks》（2017）
论文源址：https://arxiv.org/abs/1710.08864
相关代码: https://github.com/Hyperparticle/one-pixel-attack-keras
参考笔记：
算法核心：通过差分进化优化算法限制可改像素点个数最大化对抗样本被分为对抗标签的概率生成扰动e(x)
C&W（The Carlini and Wagner）
相关论文：《Towards evaluating the robustness of neural networks》（2017.3）
论文源址：http://arxiv.org/abs/1608.04644v1
相关代码：https://github.com/carlini/nn_robust_attacks
参考笔记：https://zhuanlan.zhihu.com/p/266726084
算法核心：最看不懂的一篇，尤其C(x+r)=t到f(x+r)<=0的转化问题不理解

扫描二维码关注公众号，回复： 12039708 查看本文章

基于目标模型预测概率

基于目标模型预测结果

Boundary Attack
相关论文：《Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machi》 (2018)
论文源址：https://arxiv.org/pdf/1712.04248.pdf
相关代码：https://github.com/greentfrapp/boundary-attack
参考笔记：https://zhuanlan.zhihu.com/p/67320040
算法核心：①对抗样本初始化：无目标攻击随机初始化，有目标攻击初始化为目标类图像
②样本由初始化逐渐向原始样本靠近

对抗攻击防御

对抗训练(Adversarial Training)：
防卫者通过自己构造对抗攻击，并且将人为增加扰动的对抗样本也加入到训练数据中，从而增强训练集，让训练后得到的模型更加稳定。
相关论文：
①《Intriguing properties of neural networks》（2014）

②《Ensemble Adversarial Training: Attacks and Defenses》 (2017) ：提出集成对抗训练
参考笔记：https://www.cnblogs.com/gris3/p/12688506.html

对该方法提出的质疑：
《Towards deep learning models resistant to adversarial attacks 》
提到对抗训练用比较弱的攻击时，往往并没有增加模型对更强的攻击的鲁棒性。

对抗样本检测
相关论文：
①《Early methods for detecting adversarial images》（2017）
②《Feature Squeezing Mitigates and Detects Carlini/Wagner Adversarial Examples》（2017）
③《Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks》 (2018)

论文源址：①https://openreview.net/pdf?id=B1dexpDug

相关代码：①https://github.com/hendrycks/fooling
对该方法提出的质疑：
①《Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods》（2017）
论文源址：http://arxiv.org/pdf/1705.07263

对抗样本还原、去噪
相关论文：
①《Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser》
论文源址：http://arxiv.org/pdf/1712.02976
②《ComDefend: An Efficient Image Compression Model to Defend Adversarial Examples》
梯度隐蔽
蒸馏：
相关论文
①《Distillation as a defense to adversarial perturbations against deep neural networks》（2016.3）
这个方法通过两个步骤完成对模型稳定性的提升：第一步是训练分类模型，其最后一层的softmax层除以一个常数T；第二步是用同样的输入训练第二个模型，但是训练数据的标签不用原始标签，而是用第一步中训练的模型最后一层的概率向量作为最后softmax层的目标。
②《Extending Defensive Distillation》（2017.5）

对该方法提出的质疑：
《Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples》
该论文中提到发现了一种「混淆梯度」（obfuscated gradient）现象，它给对抗样本的防御带来虚假的安全感。在案例研究中，试验了 ICLR 2018 接收的 8 篇论文，发现混淆梯度是一种常见现象，其中有 7 篇论文依赖于混淆梯度，并被的这一新型攻击技术成功攻克。

Detection systems: （在目的网络模型前面增加一个额外的探测系统，判断输入是否是经过人为扰动的对抗样本）
①Performe statistical tests:《On the (statistical) detection of adversarial examples》（2017.2）
②Use an additional model for detection:《Adversarial and clean data are not twins》（2017.4）
③《On detecting adversarial perturbations》（2017.2）
④Apply dropout at test time:《Detecting adversarial samples from artifacts》（2017.3）
预处理
①添加随机化层：《Mitigating Adversarial Effects Through Randomization》（2017）
利用GAN
①Generative Adversarial Networks (GAN):《Generative Adversarial Trainer Defense to Adversarial Perturbations with GAN》（2017.5）
②《AE-GAN: adversarial eliminating with GAN》（2017.7）
论文笔记：（https://www.zybuluo.com/wuxin1994/note/881171）
③《Efficient Defenses Against Adversarial Attacks》（2017.7）
论文笔记：(https://www.zybuluo.com/wuxin1994/note/863551)

对抗攻击实例

对抗图像补丁
相关论文：《Adversarial Patch》
论文源址：https://link.zhihu.com/?target=http%3A//arxiv.org/abs/1712.09665
论文贡献：提出一种在现实世界中创建通用的、健壮的、有针对性的对抗图像补丁的方法
对象识别
相关论文：《Robust Physical Adversarial Attack on Faster R-CNN Object Detector》
论文源址：https://link.zhihu.com/?target=https%3A//arxiv.org/abs/1804.05810
《Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition》（2016.10）
面部识别：
《Adversarial examples in the physical world》（2017.2）
实际拍照图片：
这篇文章是在实际应用中，对抗攻击往往不能将数字化的对抗样本作为目的分类器的输入，只能将对抗样本打印到纸张上，然后用拍照之类的方式得到目的网络的输入时，人为添加的扰动比较小，在拍照过程中产生了失真，不能达到攻击目的。
《Robust Physical-World Attacks on Machine Learning Models》（2017.7）
《Note on Attacking Object Detectors with Adversarial Stickers》
《Concrete Problems for Autonomous Vehicle Safety: Advantages of Bayesian Deep Learning》（2017）

自动汽车：

《Adversarial Perturbations Against Deep Neural Networks for Malware Classification》（2016.6）

论文笔记：(https://www.zybuluo.com/wuxin1994/note/854417)

《Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN》（2017.5）

论文笔记：（https://www.zybuluo.com/wuxin1994/note/867495）

《Synthesizing Robust Adversarial Examples》
3D打印：

[1]: Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
[2]: Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. “Explaining and harnessing adversarial examples.” arXiv preprint arXiv:1412.6572 (2014).
[3]: https://mermaidjs.github.io/
[4]: http://adrai.github.io/flowchart.js/