[论文阅读笔记63]Span-based Joint Entity and Relation Extraction with Transformer Pretraining

1. 基本信息

题目	论文作者与单位	来源	年份
Span-based Joint Entity and Relation Extraction with Transformer Pre-training	Markus Eberts ,Adrian Ulges,莱茵曼应用技术大学	ECAI	2019

76 Citations, 50 References

论文链接：https://arxiv.org/abs/1909.07755
论文代码：https://github.com/markus-eberts/spert

2. 要点

研究主题	问题背景	核心方法流程	亮点	数据集	结论	论文类型	关键字
信息抽取【基于span的联合模型】	Based on BIO/BILOU labels, it can not identify overlapping entities, while span-based approch can do ;	0. 任意地生成一批span候选； 1. 对每个span进行实体类型分类（classify each span into entity types）； 2. 过滤掉非常实体；fifilter non-entities, 3. 对剩下的实体进行关系分类（classify all pairs of remaining entities into relations ）	1. 区别于pipeline的方法–the approach detects entities among all token subsequences (or spans). 2. Span Filtering, 过滤不是实体的span. 3. each sentence only once through BERT (single-pass);	CoNLL04，SciERC， ADE	1. up to 2.6% F1 score； 2. demonstrate the benefifits of pre-training;strong negative sampling and localized context;		BERT，SpERT, single-pass，strong negative sampling，localized context

可能存在不足：这个候选spans是怎么来的？如果spans的候选太多，是否在效率方法有影响？

3. 模型

提出SpERT，“Span-based Entity and Relation Transformer”

（a）Span Classifification(Span 分类)

Our span classififier takes an arbitrary candidate span as input.
句子 --[tokener]–> BPE tokens --[bert]–>“ (e_1, e_2, …e_n, c)”，这里的c为一个分类token, 通过整个句子的上下文计算得来的。
s := (e_i, e_i+1, …,e_i+k)表示，长度为k个子词s;
实体类型：E∪{ none}，基中E表示有意义预先定义好的实体类型；
图的虚线box表示span classification.
输入分成三部分：span’s BERT embeddings (red) + span级的索引嵌入(blue) + 分类符c(green)

三条公式的意思为：把输入的三部分连接成一个长的张量，输入到全连接神经网络进行softmax分类。

(b) Span Filtering

过滤掉 none 类。先过滤大于10token的span.

© Relation Classifification

R是预定义好的关系；
对每个候选对(s1*, s*2)进行分类；
输入：BERT/width embeddings e(s1), e(s2) + localized context(两个实体围起来的那部分上下文，采用max-pooling合并的，yellow) --换c给换了，因为c代表了一句话，localied context更具有这两实体表达特征能力，记为c(s1*, s2); 特别地，如查s1与s2重合了c(s1, s*2) = 0

3.1 训练

要学习的参数：s (Ws, bs , Wr, br ) 及 bert的fine-tune;
损失函数：L = span的分类损失 (cross-entropy over the entity classes including none) + binary crossentropy over relation classes
span分类，采用正负样本的方法来进行；对于关系分类也是正负样本的方法来进行（负样本就是在一个句子中的主语与宾语相交换）；这里样本的构成都是在一个句子中的，没有跨多句；
每个句子只用了一次bert进行编码；

4. 实验与分析

数据集：CoNLL04，SciERC， ADE
一般bert模型：BERT_BASE (cased)；对于SciERC：SciBERT (cased)
训练过程中，bert的参数也是要调参学习的
Comparison with state of the art
Candidate selection and negative sampling
1. Localized context（两个实体中间的文本）
  1. Pre-training and entity representation
    
    研究bert预训练模型的影响。
    1. Error inspection
      
      案例分析

5. 总结（不足，适用条件，写作框架）

more elaborate forms of context for relation classififiers；
Employing additional syntactic features or learned context；

问题：这个候选spans是怎么来的？如果spans的候选太多，是否在效率方法有影响？论文中只是来了一句""

6. 知识整理（知识点，文献分类）

Joint Entity and Relation Extraction：

Miwa and Sasaki [23]把联合实体与关系抽取问题看成是一个填表问题；每个单元格对应于两实体对； Gupta et al. [10]，也是填表，与前一个不同的时，他合知birnn;

Miwa and Bansal [22], BILOU scheme; Zhou et al. [42] utilize a BILOU-based LSTM与CNN的组合；

。。。。

与本文比较相似的：[18] Xiaoya Li, Fan Yin, Zijun Sun, Xiayu Li, Arianna Yuan, Duo Chai, Mingxin Zhou, and Jiwei Li, ‘Entity-Relation Extraction as Multi-TurnQuestion Answering’, in Proc. of ACL 2019, pp. 1340–1350, Florence,Italy, (July 2019). ACL.

这个论文好像也用到了提示的模板。

Span-based Approaches

基于 BIO/BILOU解决不了overlapping的问题。

7. 参考文献

【1】 Kenton Lee, Luheng He, Mike Lewis, and Luke Zettlemoyer, ‘Endto-end Neural Coreference Resolution’, in Proc. of EMNLP 2017, pp. 188–197, Copenhagen, Denmark, (September 2017). ACL.