ShowMeAI研究中心

AI安全偏见与公平

ShowMeAI为斯坦福CS224n《自然语言处理与深度学习(Natural Language Processing with Deep Learning)》课程的全部课件，做了中文翻译和注释，并制作成了GIF动图！视频和课件等资料的获取方式见文末。

1.Bias in the Vision and Language of Artificial Intelligence

Bias in the Vision and Language of Artificial Intelligence

2.Prototype Theory

What do you see?

Bananas
Stickers
Dole Bananas
Bananas at a store
Bananas on shelves
Bunches of bananas
Bananas with stickers on them
Bunches of bananas with stickers on them on shelves in a store

...We don’t tend to say Yellow Bananas

What do you see?

Prototype Theory

Prototype Theory
- 分类的目的之一是减少刺激行为和认知上可用的比例的无限差异
- 物品的一些核心、原型概念可能来自于存储的对象类别的典型属性 (Rosch, 1975)
- 也可以存储范例 (Wu & Barsalou, 2009)

Prototype Theory

Doctor —— Female Doctor
大多数受试者忽视了医生是女性的可能性，包括男性、女性和自称女权主义者的人

Prototype Theory

World Learning from text

Human Reporting Bias
- murdered 是 blinked 出现次数的十倍
- 我们不倾向于提及眨眼和呼吸等事情

Human Reporting Bias

Human Reporting Bias
- 人们写作中的行为、结果或属性的频率并不反映真实世界的频率，也不反映某一属性在多大程度上是某一类个体的特征。
- 更多关于我们处理世界和我们认为非凡的东西的实际情况。这影响到我们学习的一切。

Human Reporting Bias

Human Reporting Bias in Data

Data 数据
- Reporting bias 报告偏见：人们分享的并不是真实世界频率的反映
- Selection Bias 选择偏差：选择不反映随机样本
- Out-group homogeneity bias 外群体同质性偏见：People tend to see outgroup members as more alike than ingroup members when comparing attitudes, values, personality traits, and other characteristics

Interpretation
- Confirmation bias 确认偏见：倾向于寻找、解释、支持和回忆信息，以确认一个人先前存在的信念或假设
- Overgeneralization 泛化过度：根据过于笼统和/或不够具体的信息得出结论
- Correlation fallacy 相关性谬误：混淆相关性和因果关系
- Automation bias 自动化偏差：人类倾向于喜欢来自自动化决策系统的建议，而不是没有自动化的相互矛盾的信息

3.Biases in Data

Biases in Data

Selection Bias 选择偏差：选择不反映随机样本

Biases in Data

Out-group homogeneity bias 外群体同质性偏见：在比较态度、价值观、个性特征和其他特征时，往往群体外的成员认为比群体内的成员更相似
这有些难以理解：意思就是左边的四只猫之间是非常不同的，但是在狗的眼里他们是相同的

Biases in Data → Biased Data Representation

Biases in Data → Biased Data Representation
你可能对你能想到的每一个群体都有适当数量的数据，但有些群体的表现不如其他群体积极。

Biases in Data → Biased Labels

Biases in Data → Biased Labels
数据集中的注释将反映注释者的世界观

4.Biases in Interpretation

Biases in Interpretation

Biases in Interpretation
- Confirmation bias 确认偏见：倾向于寻找、解释、支持和回忆信息，以确认一个人先前存在的信念或假设

Biases in Interpretation

Biases in Interpretation
- Overgeneralization 泛化过度：根据过于笼统和/或不够具体的信息得出结论（相关：过拟合）

Biases in Interpretation

Biases in Interpretation
- Correlation fallacy 相关性谬误：混淆相关性和因果关系

Biases in Interpretation

Biases in Interpretation
- Automation bias 自动化偏差：人类倾向于喜欢来自自动化决策系统的建议，而不是没有自动化的相互矛盾的信息

Biases in Interpretation

会形成反馈循环
这被称为 Bias Network Effect 以及 Bias “Laundering”

Human data perpetuates human biases. As ML learns from human data, the result is a bias network effect.

Human data perpetuates human biases. As ML learns from human data, the result is a bias network effect.
人类数据延续了人类的偏见。当ML从人类数据中学习时，结果是一个偏置网络效应。

5.BIAS = BAD ??

BIAS = BAD ??

“Bias” can be Good, Bad, Neutral

统计以及 ML中的偏差
- 估计值的偏差：预测值与我们试图预测的正确值之间的差异
- “偏差”一词b(如y = mx + b)
认知偏见
- 确认性偏差、近因性偏差、乐观性偏差
算法偏差
- 对与种族、收入、性取向、宗教、性别和其他历史上与歧视和边缘化相关的特征相关的人的不公平、不公平或偏见待遇，何时何地在算法系统或算法辅助决策中体现出来”

amplify injustice

如何避免算法偏差，开发出不会放大差异的算法

6.Predicting Future Criminal Behavior

Predicting Future Criminal Behavior

Predicting Policing

Predicting Future Criminal Behavior
- 算法识别潜在的犯罪热点
- 基于之前报道的犯罪的地方，而不是已知发生在哪里
- 从过去预测未来事件
- 预测的是逮捕的地方而不是犯罪的地方

Predicting Sentencing

Prater (白人)额定低风险入店行窃后，尽管两个武装抢劫;一次持械抢劫未遂。
Borden (黑色)额定高危后她和一个朋友(但在警察到来之前返回)一辆自行车和摩托车坐在外面。
两年后，Borden没有被指控任何新的罪行。Prater因重大盗窃罪被判8年有期徒刑。
系统默认认为黑人的犯罪风险高于白人

7.Automation Bias

Automation Bias

Predicting Criminality

以色列启动 Faception
Faception是第一个科技领域的率先面市的，专有的计算机视觉和机器学习技术分析人员和揭示他们的个性只基于他们的面部图像。
提供专业的引擎从脸的形象识别“高智商”、“白领犯罪”、“恋童癖”，和“恐怖分子”。
主要客户为国土安全和公共安全。

Predicting Criminality

“Automated Inference on Criminality using Face Images” Wu and Zhang, 2016. arXiv
1856个紧密裁剪的面孔的图像，包括“通缉犯”ID特定区域的照片
存在确认偏差和相关性偏差

8.Selection Bias + Experimenter’s Bias +Confirmation Bias + Correlation Fallacy +Feedback Loops

Selection Bias + Experimenter’s Bias +Confirmation Bias + Correlation Fallacy +Feedback Loops

Predicting Criminality - The Media Blitz

9.(Claiming to) Predict Internal Qualities Subject To Discrimination

(Claiming to) Predict Internal Qualities Subject To Discrimination

Predicting Homosexuality

Wang and Kosinski, Deep neural networks are more accurate than humans at detecting sexual orientation from facial images, 2017.
“Sexual orientation detector” using 35,326 images from public profiles on a US dating website.
“与性取向的产前激素理论(PHT)相一致，男同性恋者和女同性恋者往往具有非典型的性别面部形态。”

Predicting Homosexuality

在自拍中，同性恋和异性恋之间的差异与打扮、表现和生活方式有关，也就是说，文化差异，而不是面部结构的差异
See our longer response on Medium, “Do Algorithms Reveal Sexual Orientation or Just Expose our Stereotypes?”
Selection Bias + Experimenter’s Bias + Correlation Fallacy