2020年NLP所有领域最新、经典、顶会、必读论文

    本资源整理了近几年,自然语言处理领域各大AI相关的顶会中,一些经典、最新、必读的论文,涉及NLP领域相关的,Bert模型、Transformer模型、迁移学习、文本摘要、情感分析、问答、机器翻译、文本生成、质量评估、纠错(多任务、masking策略等。)、Probe、多语言、领域相关、多模态、模型压缩、谓词填充、Analysis、分词解析NER、代词指代消解、词义消歧、情感分析、关系抽取、知识库、文本分类等,几乎所有领域。

    资源整理自网络,源地址:https://github.com/changwookjun/nlp-paper#probe

     

    带链接版文档下载地址:

    链接: https://pan.baidu.com/s/1gySZ2Yn3IIpMREB17fDKDg 

    提取码: nrp7

     

目录

    Bert模型

    Transformer模型

扫描二维码关注公众号,回复: 11333797 查看本文章

    迁移学习

    文本摘要

    情感分析

    问答

    机器翻译

    下游任务

            对话系统

            谓词填充

            Analysis

            分词解析NER

            代词指代消解

            词义消歧

            情感分析

            关系抽取

            知识库

            文本分类

            WSC·WNLI NLI

            常识推理

            摘要抽取

            信息抽取

    文本生成

    质量评估

    纠错(多任务、masking策略等。)

    Probe

    多语言

    领域相关

    多模态

    模型压缩

    

论文列表

    Bert相关

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding - NAACL 2019)

    ERNIE 2.0: A Continual Pre-training Framework for Language Understanding - arXiv 2019)

    StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding - arXiv 2019)

    RoBERTa: A Robustly Optimized BERT Pretraining Approach - arXiv 2019)

    ALBERT: A Lite BERT for Self-supervised Learning of Language Representations - arXiv 2019)

    Multi-Task Deep Neural Networks for Natural Language Understanding - arXiv 2019)

    What does BERT learn about the structure of language? (ACL2019)

    Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned (ACL2019) [github]

    Open Sesame: Getting Inside BERT's Linguistic Knowledge (ACL2019 WS)

    Analyzing the Structure of Attention in a Transformer Language Model (ACL2019 WS)

    What Does BERT Look At? An Analysis of BERT's Attention (ACL2019 WS)

    Do Attention Heads in BERT Track Syntactic Dependencies?

    Blackbox meets blackbox: Representational Similarity and Stability Analysis of Neural Language Models and Brains (ACL2019 WS)

    Inducing Syntactic Trees from BERT Representations (ACL2019 WS)

    A Multiscale Visualization of Attention in the Transformer Model (ACL2019 Demo)

    Visualizing and Measuring the Geometry of BERT

    How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings(EMNLP2019)

    Are Sixteen Heads Really Better than One? (NeurIPS2019)

    On the Validity of Self-Attention as Explanation in Transformer Models

    Visualizing and Understanding the Effectiveness of BERT (EMNLP2019)

    Attention Interpretability Across NLP Tasks

    Revealing the Dark Secrets of BERT (EMNLP2019)

    Investigating BERT's Knowledge of Language: Five Analysis Methods with NPIs (EMNLP2019)

    The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives (EMNLP2019)

    A Primer in BERTology: What we know about how BERT works

    Do NLP Models Know Numbers? Probing Numeracy in Embeddings (EMNLP2019)

    How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations (CIKM2019)

    Whatcha lookin' at? DeepLIFTing BERT's Attention in Question Answering

    What does BERT Learn from Multiple-Choice Reading Comprehension Datasets?

    Calibration of Pre-trained Transformers

    exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models [github]

    Transformer Series

    Attention Is All You Need - arXiv 2017)

    Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context - arXiv 2019)

    Universal Transformers - ICLR 2019)

    Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer - arXiv 2019)

    Reformer: The Efficient Transformer - ICLR 2020)

    Adaptive Attention Span in Transformers (ACL2019)

    Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (ACL2019) [github]

    Generating Long Sequences with Sparse Transformers

    Adaptively Sparse Transformers (EMNLP2019)

    Compressive Transformers for Long-Range Sequence Modelling

    The Evolved Transformer (ICML2019)

    Reformer: The Efficient Transformer (ICLR2020) [github]

    GRET: Global Representation Enhanced Transformer (AAAI2020)

    Transformer on a Diet [github]

    Efficient Content-Based Sparse Attention with Routing Transformers

    BP-Transformer: Modelling Long-Range Context via Binary Partitioning

    Recipes for building an open-domain chatbot

    Longformer: The Long-Document Transformer

    Transfer Learning

    Deep contextualized word representations - NAACL 2018)

    Universal Language Model Fine-tuning for Text Classification - ACL 2018)

    Improving Language Understanding by Generative Pre-Training - Alec Radford)

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding - NAACL 2019)

    Cloze-driven Pretraining of Self-attention Networks - arXiv 2019)

    Unified Language Model Pre-training for Natural Language Understanding and Generation - arXiv 2019)

    MASS: Masked Sequence to Sequence Pre-training for Language Generation - ICML 2019)

    MPNet: Masked and Permuted Pre-training for Language Understanding)[github]

    Text Summarization

    Positional Encoding to Control Output Sequence Length - Sho Takase(2019)

    Fine-tune BERT for Extractive Summarization - Yang Liu(2019)

    Language Models are Unsupervised Multitask Learners - Alec Radford(2019)

    A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss - Wan-Ting Hsu(2018)

    A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents - Arman Cohan(2018)

    GENERATING WIKIPEDIA BY SUMMARIZING LONG SEQUENCES - Peter J. Liu(2018)

    Get To The Point: Summarization with Pointer-Generator Networks - Abigail See(2017) * A Neural Attention Model for Sentence Summarization - Alexander M. Rush(2015)

    Sentiment Analysis

    Multi-Task Deep Neural Networks for Natural Language Understanding - Xiaodong Liu(2019)

    Aspect-level Sentiment Analysis using AS-Capsules - Yequan Wang(2019)

    On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis - Jose Camacho-Collados(2018)

    Learned in Translation: Contextualized Word Vectors - Bryan McCann(2018)

    Universal Language Model Fine-tuning for Text Classification - Jeremy Howard(2018)

    Convolutional Neural Networks with Recurrent Neural Filters - Yi Yang(2018)

    Information Aggregation via Dynamic Routing for Sequence Encoding - Jingjing Gong(2018)

    Learning to Generate Reviews and Discovering Sentiment - Alec Radford(2017)

    A Structured Self-attentive Sentence Embedding - Zhouhan Lin(2017)

    Question Answering

    Language Models are Unsupervised Multitask Learners - Alec Radford(2019)

    Improving Language Understanding by Generative Pre-Training - Alec Radford(2018)

    Bidirectional Attention Flow for Machine Comprehension - Minjoon Seo(2018)

    Reinforced Mnemonic Reader for Machine Reading Comprehension - Minghao Hu(2017)

    Neural Variational Inference for Text Processing - Yishu Miao(2015)

    Machine Translation

    The Evolved Transformer - David R. So(2019)

    Surver paper

    Evolution of transfer learning in natural language processing

    Pre-trained Models for Natural Language Processing: A Survey

    A Survey on Contextual Embeddings

    Downstream task

    QA MC Dialogue

    A BERT Baseline for the Natural Questions

    MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension (ACL2019)

    Unsupervised Domain Adaptation on Reading Comprehension

    BERTQA -- Attention on Steroids

    A Multi-Type Multi-Span Network for Reading Comprehension that Requires Discrete Reasoning (EMNLP2019)

    SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering

    Multi-hop Question Answering via Reasoning Chains

    Select, Answer and Explain: Interpretable Multi-hop Reading Comprehension over Multiple Documents

    Multi-step Entity-centric Information Retrieval for Multi-Hop Question Answering (EMNLP2019 WS)

    End-to-End Open-Domain Question Answering with BERTserini (NAALC2019)

    Latent Retrieval for Weakly Supervised Open Domain Question Answering (ACL2019)

    Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering (EMNLP2019)

    Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering (ICLR2020)

    Learning to Ask Unanswerable Questions for Machine Reading Comprehension (ACL2019)

    Unsupervised Question Answering by Cloze Translation (ACL2019)

    Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation

    A Recurrent BERT-based Model for Question Generation (EMNLP2019 WS)

    Learning to Answer by Learning to Ask: Getting the Best of GPT-2 and BERT Worlds

    Enhancing Pre-Trained Language Representations with Rich Knowledge for Machine Reading Comprehension (ACL2019)

    Incorporating Relation Knowledge into Commonsense Reading Comprehension with Multi-task Learning (CIKM2019)

    SG-Net: Syntax-Guided Machine Reading Comprehension

    MMM: Multi-stage Multi-task Learning for Multi-choice Reading Comprehension

    Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning (EMNLP2019)

    ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning (ICLR2020)

    Robust Reading Comprehension with Linguistic Constraints via Posterior Regularization

    BAS: An Answer Selection Method Using BERT Language Model

    Beat the AI: Investigating Adversarial Human Annotations for Reading Comprehension

    A Simple but Effective Method to Incorporate Multi-turn Context with BERT for Conversational Machine Comprehension(ACL2019 WS)

    FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension (ACL2019 WS)

    BERT with History Answer Embedding for Conversational Question Answering (SIGIR2019)

    GraphFlow: Exploiting Conversation Flow with Graph Neural Networks for Conversational Machine Comprehension(ICML2019 WS)

    Beyond English-only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for Bulgarian (RANLP2019)

    XQA: A Cross-lingual Open-domain Question Answering Dataset (ACL2019)

    Cross-Lingual Machine Reading Comprehension (EMNLP2019)

    Zero-shot Reading Comprehension by Cross-lingual Transfer Learning with Multi-lingual Language Representation Model

    Multilingual Question Answering from Formatted Text applied to Conversational Agents

    BiPaR: A Bilingual Parallel Dataset for Multilingual and Cross-lingual Reading Comprehension on Novels (EMNLP2019)

    MLQA: Evaluating Cross-lingual Extractive Question Answering

    Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension (TACL)

    SberQuAD - Russian Reading Comprehension Dataset: Description and Analysis

    Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension (EMNLP2019)

    BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer(Interspeech2019)

    Dialog State Tracking: A Neural Reading Comprehension Approach

    A Simple but Effective BERT Model for Dialog State Tracking on Resource-Limited Systems (ICASSP2020)

    Fine-Tuning BERT for Schema-Guided Zero-Shot Dialogue State Tracking

    Goal-Oriented Multi-Task BERT-Based Dialogue State Tracker

    Domain Adaptive Training BERT for Response Selection

    BERT Goes to Law School: Quantifying the Competitive Advantage of Access to Large Legal Corpora in Contract Understanding

    Slot filling

    BERT for Joint Intent Classification and Slot Filling

    Multi-lingual Intent Detection and Slot Filling in a Joint BERT-based Model

    A Comparison of Deep Learning Methods for Language Understanding (Interspeech2019)

    Analysis

    Fine-grained Information Status Classification Using Discourse Context-Aware Self-Attention

    Neural Aspect and Opinion Term Extraction with Mined Rules as Weak Supervision (ACL2019)

    BERT-based Lexical Substitution (ACL2019)

    Assessing BERT’s Syntactic Abilities

    Does BERT agree? Evaluating knowledge of structure dependence through agreement relations

    Simple BERT Models for Relation Extraction and Semantic Role Labeling

    LIMIT-BERT : Linguistic Informed Multi-Task BERT

    A Simple BERT-Based Approach for Lexical Simplification

    Multi-headed Architecture Based on BERT for Grammatical Errors Correction (ACL2019 WS)

    Towards Minimal Supervision BERT-based Grammar Error Correction

    BERT-Based Arabic Social Media Author Profiling

    Sentence-Level BERT and Multi-Task Learning of Age and Gender in Social Media

    Evaluating the Factual Consistency of Abstractive Text Summarization

    NegBERT: A Transfer Learning Approach for Negation Detection and Scope Resolution

    xSLUE: A Benchmark and Analysis Platform for Cross-Style Language Understanding and Evaluation

    TabFact: A Large-scale Dataset for Table-based Fact Verification

    Rapid Adaptation of BERT for Information Extraction on Domain-Specific Business Documents

    LAMBERT: Layout-Aware language Modeling using BERT for information extraction

    Keyphrase Extraction from Scholarly Articles as Sequence Labeling using Contextualized Embeddings (ECIR2020) [github]

    Keyphrase Extraction with Span-based Feature Representations

    What do you mean, BERT? Assessing BERT as a Distributional Semantics Model

    Word segmentation parsing NER

    BERT Meets Chinese Word Segmentation

    Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning

    Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT

    Evaluating Contextualized Embeddings on 54 Languages in POS Tagging, Lemmatization and Dependency Parsing

    NEZHA: Neural Contextualized Representation for Chinese Language Understanding

    Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing -- A Tale of Two Parsers Revisited (EMNLP2019)

    Is POS Tagging Necessary or Even Helpful for Neural Dependency Parsing?

    Parsing as Pretraining (AAAI2020)

    Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing

    Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement

    Named Entity Recognition -- Is there a glass ceiling? (CoNLL2019)

    A Unified MRC Framework for Named Entity Recognition

    Training Compact Models for Low Resource Entity Tagging using Pre-trained Language Models

    Robust Named Entity Recognition with Truecasing Pretraining (AAAI2020)

    LTP: A New Active Learning Strategy for Bert-CRF Based Named Entity Recognition

    MT-BioNER: Multi-task Learning for Biomedical Named Entity Recognition using Deep Bidirectional Transformers

    Portuguese Named Entity Recognition using BERT-CRF

    Towards Lingua Franca Named Entity Recognition with BERT

    Pronoun coreference resolution

    Resolving Gendered Ambiguous Pronouns with BERT (ACL2019 WS)

    Anonymized BERT: An Augmentation Approach to the Gendered Pronoun Resolution Challenge (ACL2019 WS)

    Gendered Pronoun Resolution using BERT and an extractive question answering formulation (ACL2019 WS)

    MSnet: A BERT-based Network for Gendered Pronoun Resolution (ACL2019 WS)

    Fill the GAP: Exploiting BERT for Pronoun Resolution (ACL2019 WS)

    On GAP Coreference Resolution Shared Task: Insights from the 3rd Place Solution (ACL2019 WS)

    Look Again at the Syntax: Relational Graph Convolutional Network for Gendered Ambiguous Pronoun Resolution (ACL2019 WS)

    BERT Masked Language Modeling for Co-reference Resolution (ACL2019 WS)

    Coreference Resolution with Entity Equalization (ACL2019)

    BERT for Coreference Resolution: Baselines and Analysis (EMNLP2019) [github]

    WikiCREM: A Large Unsupervised Corpus for Coreference Resolution (EMNLP2019)

    Ellipsis and Coreference Resolution as Question Answering

    Coreference Resolution as Query-based Span Prediction

    Multi-task Learning Based Neural Bridging Reference Resolution

    Word sense disambiguation

    GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge (EMNLP2019)

    Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations (EMNLP2019)

    Using BERT for Word Sense Disambiguation

    Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation (ACL2019)

    Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings (KONVENS2019)

    Sentiment analysis

    Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence (NAACL2019)

    BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis (NAACL2019)

    Exploiting BERT for End-to-End Aspect-based Sentiment Analysis (EMNLP2019 WS)

    Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classification

    An Investigation of Transfer Learning-Based Sentiment Analysis in Japanese (ACL2019)

    "Mask and Infill" : Applying Masked Language Model to Sentiment Transfer

    Adversarial Training for Aspect-Based Sentiment Analysis with BERT

    Utilizing BERT Intermediate Layers for Aspect Based Sentiment Analysis and Natural Language Inference

    Relation extraction

    Matching the Blanks: Distributional Similarity for Relation Learning (ACL2019)

    BERT-Based Multi-Head Selection for Joint Entity-Relation Extraction (NLPCC2019)

    Enriching Pre-trained Language Model with Entity Information for Relation Classification

    Span-based Joint Entity and Relation Extraction with Transformer Pre-training

    Fine-tune Bert for DocRED with Two-step Process

    Entity, Relation, and Event Extraction with Contextualized Span Representations (EMNLP2019)

    Knowledge base

    KG-BERT: BERT for Knowledge Graph Completion

    Language Models as Knowledge Bases? (EMNLP2019) [github]

    BERT is Not a Knowledge Base (Yet): Factual Knowledge vs. Name-Based Reasoning in Unsupervised QA

    Inducing Relational Knowledge from BERT (AAAI2020)

    Latent Relation Language Models (AAAI2020)

    Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model (ICLR2020)

    Zero-shot Entity Linking with Dense Entity Retrieval

    Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking (CoNLL2019)

    Improving Entity Linking by Modeling Latent Entity Type Information (AAAI2020)

    PEL-BERT: A Joint Model for Protocol Entity Linking

    How Can We Know What Language Models Know?

    REALM: Retrieval-Augmented Language Model Pre-Training

    Text classification

    How to Fine-Tune BERT for Text Classification?

    X-BERT: eXtreme Multi-label Text Classification with BERT

    DocBERT: BERT for Document Classification

    Enriching BERT with Knowledge Graph Embeddings for Document Classification

    Classification and Clustering of Arguments with Contextualized Word Embeddings (ACL2019)

    BERT for Evidence Retrieval and Claim Verification

    Stacked DeBERT: All Attention in Incomplete Data for Text Classification

    Cost-Sensitive BERT for Generalisable Sentence Classification with Imbalanced Data

    WSC WNLI NLI

    Exploring Unsupervised Pretraining and Sentence Structure Modelling for Winograd Schema Challenge

    A Surprisingly Robust Trick for the Winograd Schema Challenge

    WinoGrande: An Adversarial Winograd Schema Challenge at Scale (AAAI2020)

    Improving Natural Language Inference with a Pretrained Parser

    Adversarial NLI: A New Benchmark for Natural Language Understanding

    Adversarial Analysis of Natural Language Inference Systems (ICSC2020)

    HypoNLI: Exploring the Artificial Patterns of Hypothesis-only Bias in Natural Language Inference (LREC2020)

    Evaluating BERT for natural language inference: A case study on the CommitmentBank (EMNLP2019)

    Commonsense

    CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge (NAACL2019)

    HellaSwag: Can a Machine Really Finish Your Sentence? (ACL2019) [website]

    Story Ending Prediction by Transferable BERT (IJCAI2019)

    Explain Yourself! Leveraging Language Models for Commonsense Reasoning (ACL2019)

    Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models

    Informing Unsupervised Pretraining with External Linguistic Knowledge

    Commonsense Knowledge + BERT for Level 2 Reading Comprehension Ability Test

    BIG MOOD: Relating Transformers to Explicit Commonsense Knowledge

    Commonsense Knowledge Mining from Pretrained Models (EMNLP2019)

    KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning (EMNLP2019)

    Cracking the Contextual Commonsense Code: Understanding Commonsense Reasoning Aptitude of Deep Contextual Representations (EMNLP2019 WS)

    Do Massively Pretrained Language Models Make Better Storytellers? (CoNLL2019)

    PIQA: Reasoning about Physical Commonsense in Natural Language (AAAI2020)

    Evaluating Commonsense in Pre-trained Language Models (AAAI2020)

    Why Do Masked Neural Language Models Still Need Common Sense Knowledge?

    Do Neural Language Representations Learn Physical Commonsense? (CogSci2019)

    Extractive summarization

    HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization (ACL2019)

    Deleter: Leveraging BERT to Perform Unsupervised Successive Text Compression

    Discourse-Aware Neural Extractive Model for Text Summarization

    PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization[github]

    IR

    Passage Re-ranking with BERT

    Investigating the Successes and Failures of BERT for Passage Re-Ranking

    Understanding the Behaviors of BERT in Ranking

    Document Expansion by Query Prediction

    CEDR: Contextualized Embeddings for Document Ranking (SIGIR2019)

    Deeper Text Understanding for IR with Contextual Neural Language Modeling (SIGIR2019)

    FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance (SIGIR2019)

    Multi-Stage Document Ranking with BERT

    REALM: Retrieval-Augmented Language Model Pre-Training

    Generation

    BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model (NAACL2019 WS)

    Pretraining-Based Natural Language Generation for Text Summarization

    Text Summarization with Pretrained Encoders (EMNLP2019) [github (original)] [github (huggingface)]

    Multi-stage Pretraining for Abstractive Summarization

    PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

    MASS: Masked Sequence to Sequence Pre-training for Language Generation (ICML2019) [github], [github]

    Unified Language Model Pre-training for Natural Language Understanding and Generation [github] (NeurIPS2019)

    UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training [github]

    ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training

    Towards Making the Most of BERT in Neural Machine Translation

    Improving Neural Machine Translation with Pre-trained Representation

    On the use of BERT for Neural Machine Translation (EMNLP2019 WS)

    Incorporating BERT into Neural Machine Translation (ICLR2020)

    Recycling a Pre-trained BERT Encoder for Neural Machine Translation

    Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

    Mask-Predict: Parallel Decoding of Conditional Masked Language Models (EMNLP2019)

    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

    ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

    Cross-Lingual Natural Language Generation via Pre-Training (AAAI2020) [github]

    Multilingual Denoising Pre-training for Neural Machine Translation

    PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable

    Unsupervised Pre-training for Natural Language Generation: A Literature Review

    Quality evaluator

    BERTScore: Evaluating Text Generation with BERT (ICLR2020)

    Machine Translation Evaluation with BERT Regressor

    SumQE: a BERT-based Summary Quality Estimation Model (EMNLP2019)

    MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance (EMNLP2019) [github]

    BERT as a Teacher: Contextual Embeddings for Sequence-Level Reward

    Modification (multi-task, masking strategy, etc.)

    Multi-Task Deep Neural Networks for Natural Language Understanding (ACL2019)

    The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding

    BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning (ICML2019)

    Unifying Question Answering and Text Classification via Span Extraction

    ERNIE: Enhanced Language Representation with Informative Entities (ACL2019)

    ERNIE: Enhanced Representation through Knowledge Integration

    ERNIE 2.0: A Continual Pre-training Framework for Language Understanding (AAAI2020)

    Pre-Training with Whole Word Masking for Chinese BERT

    SpanBERT: Improving Pre-training by Representing and Predicting Spans [github]

    Blank Language Models

    Efficient Training of BERT by Progressively Stacking (ICML2019) [github]

    RoBERTa: A Robustly Optimized BERT Pretraining Approach [github]

    ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (ICLR2020)

    ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (ICLR2020) [github] [blog]

    FreeLB: Enhanced Adversarial Training for Language Understanding (ICLR2020)

    KERMIT: Generative Insertion-Based Modeling for Sequences

    DisSent: Sentence Representation Learning from Explicit Discourse Relations (ACL2019)

    StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding (ICLR2020)

    Syntax-Infused Transformer and BERT models for Machine Translation and Natural Language Understanding

    SenseBERT: Driving Some Sense into BERT

    Semantics-aware BERT for Language Understanding (AAAI2020)

    K-BERT: Enabling Language Representation with Knowledge Graph

    Knowledge Enhanced Contextual Word Representations (EMNLP2019)

    KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation

    Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (EMNLP2019)

    SBERT-WK: A Sentence Embedding Method By Dissecting BERT-based Word Models

    Universal Text Representation from BERT: An Empirical Study

    Symmetric Regularization based BERT for Pair-wise Semantic Reasoning

    Transfer Fine-Tuning: A BERT Case Study (EMNLP2019)

    Improving Pre-Trained Multilingual Models with Vocabulary Expansion (CoNLL2019)

    SesameBERT: Attention for Anywhere

    Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer [github]

    SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

    Probe

    A Structural Probe for Finding Syntax in Word Representations (NAACL2019)

    Linguistic Knowledge and Transferability of Contextual Representations (NAACL2019) [github]

    Probing What Different NLP Tasks Teach Machines about Function Word Comprehension (*SEM2019)

    BERT Rediscovers the Classical NLP Pipeline (ACL2019)

    Probing Neural Network Comprehension of Natural Language Arguments (ACL2019)

    Cracking the Contextual Commonsense Code: Understanding Commonsense Reasoning Aptitude of Deep Contextual Representations (EMNLP2019 WS)

    What do you mean, BERT? Assessing BERT as a Distributional Semantics Model

    Quantity doesn't buy quality syntax with neural language models (EMNLP2019)

    Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction (ICLR2020)

    oLMpics -- On what Language Model Pre-training Captures

    How Much Knowledge Can You Pack Into the Parameters of a Language Model?

    What Does My QA Model Know? Devising Controlled Probes using Expert Knowledge

    Multi-lingual

    Multilingual Constituency Parsing with Self-Attention and Pre-Training (ACL2019)

    Language Model Pretraining (NeurIPS2019) [github]

    75 Languages, 1 Model: Parsing Universal Dependencies Universally (EMNLP2019) [github]

    Zero-shot Dependency Parsing with Pre-trained Multilingual Sentence Representations (EMNLP2019 WS)

    Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT (EMNLP2019)

    How multilingual is Multilingual BERT? (ACL2019)

    How Language-Neutral is Multilingual BERT?

    Is Multilingual BERT Fluent in Language Generation?

    Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks (EMNLP2019)

    BERT is Not an Interlingua and the Bias of Tokenization (EMNLP2019 WS)

    Cross-Lingual Ability of Multilingual BERT: An Empirical Study (ICLR2020)

    Multilingual Alignment of Contextual Word Representations (ICLR2020)

    On the Cross-lingual Transferability of Monolingual Representations

    Unsupervised Cross-lingual Representation Learning at Scale

    Emerging Cross-lingual Structure in Pretrained Language Models

    Can Monolingual Pretrained Models Help Cross-Lingual Classification?

    Fully Unsupervised Crosslingual Semantic Textual Similarity Metric Based on BERT for Identifying Parallel Data (CoNLL2019)

    What the [MASK]? Making Sense of Language-Specific BERT Models

    XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

    Other than English models

    CamemBERT: a Tasty French Language Model

    FlauBERT: Unsupervised Language Model Pre-training for French

    Multilingual is not enough: BERT for Finnish

    BERTje: A Dutch BERT Model

    RobBERT: a Dutch RoBERTa-based Language Model

    Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language

    AraBERT: Transformer-based Model for Arabic Language Understanding

    PhoBERT: Pre-trained language models for Vietnamese

    CLUECorpus2020: A Large-scale Chinese Corpus for Pre-training Language Model

    Domain specific

    BioBERT: a pre-trained biomedical language representation model for biomedical text mining

    Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets (ACL2019 WS)

    BERT-based Ranking for Biomedical Entity Normalization

    PubMedQA: A Dataset for Biomedical Research Question Answering (EMNLP2019)

    Pre-trained Language Model for Biomedical Question Answering

    How to Pre-Train Your Model? Comparison of Different Pre-Training Models for Biomedical Question Answering

    ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission

    Publicly Available Clinical BERT Embeddings (NAACL2019 WS)

    Progress Notes Classification and Keyword Extraction using Attention-based Deep Learning Models with BERT

    SciBERT: Pretrained Contextualized Embeddings for Scientific Text [github]

    PatentBERT: Patent Classification with Fine-Tuning a pre-trained BERT Model

    Multi-modal

    VideoBERT: A Joint Model for Video and Language Representation Learning (ICCV2019)

    ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks (NeurIPS2019)

    VisualBERT: A Simple and Performant Baseline for Vision and Language

    Selfie: Self-supervised Pretraining for Image Embedding

    ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data

    Contrastive Bidirectional Transformer for Temporal Representation Learning

    M-BERT: Injecting Multimodal Information in the BERT Structure

    LXMERT: Learning Cross-Modality Encoder Representations from Transformers (EMNLP2019)

    Fusion of Detected Objects in Text for Visual Question Answering (EMNLP2019)

    BERT representations for Video Question Answering (WACV2020)

    Unified Vision-Language Pre-Training for Image Captioning and VQA [github]

    Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline

    VL-BERT: Pre-training of Generic Visual-Linguistic Representations (ICLR2020)

    Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training

    UNITER: Learning UNiversal Image-TExt Representations

    Supervised Multimodal Bitransformers for Classifying Images and Text

    Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks

    BERT Can See Out of the Box: On the Cross-modal Transferability of Text Representations

    BERT for Large-scale Video Segment Classification with Test-time Augmentation (ICCV2019WS)

    SpeechBERT: Cross-Modal Pre-trained Language Model for End-to-end Spoken Question Answering

    vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

    Effectiveness of self-supervised pre-training for speech recognition

    Understanding Semantics from Speech Through Pre-training

    Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models

    Model compression

    Distilling Task-Specific Knowledge from BERT into Simple Neural Networks

    Patient Knowledge Distillation for BERT Model Compression (EMNLP2019)

    Small and Practical BERT Models for Sequence Labeling (EMNLP2019)

    Pruning a BERT-based Question Answering Model

    TinyBERT: Distilling BERT for Natural Language Understanding [github]

    DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (NeurIPS2019 WS) [github]

    Knowledge Distillation from Internal Representations (AAAI2020)

    PoWER-BERT: Accelerating BERT inference for Classification Tasks

    WaLDORf: Wasteless Language-model Distillation On Reading-comprehension

    Extreme Language Model Compression with Optimal Subwords and Shared Projections

    BERT-of-Theseus: Compressing BERT by Progressive Module Replacing

    Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

    MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers

    Compressing Large-Scale Transformer-Based Models: A Case Study on BERT

    Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

    MobileBERT: Task-Agnostic Compression of BERT by Progressive Knowledge Transfer

    Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

    Q8BERT: Quantized 8Bit BERT (NeurIPS2019 WS)

    Misc

    jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models [github]

    Cloze-driven Pretraining of Self-attention Networks

    Learning and Evaluating General Linguistic Intelligence

    To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks (ACL2019 WS)

    Learning to Speak and Act in a Fantasy Text Adventure Game (EMNLP2019)

    Conditional BERT Contextual Augmentation

    Data Augmentation using Pre-trained Transformer Models

    Large Batch Optimization for Deep Learning: Training BERT in 76 minutes (ICLR2020)

    Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models (ICLR2020)

    A Mutual Information Maximization Perspective of Language Representation Learning (ICLR2020)

    Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment (AAAI2020)

    Thieves on Sesame Street! Model Extraction of BERT-based APIs (ICLR2020)

    Graph-Bert: Only Attention is Needed for Learning Graph Representations

    CodeBERT: A Pre-Trained Model for Programming and Natural Languages

    Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping

    Extending Machine Language Models toward Human-Level Language Understanding

    Glyce: Glyph-vectors for Chinese Character Representations

    Back to the Future -- Sequential Alignment of Text Representations

    Improving Cuneiform Language Identification with BERT (NAACL2019 WS)

    BERT has a Moral Compass: Improvements of ethical and moral values of machines

    SMILES-BERT: Large Scale Unsupervised Pre-Training for Molecular Property Prediction (ACM-BCB2019)

    On the comparability of Pre-trained Language Models

    Transformers: State-of-the-art Natural Language Processing

    Jukebox: A Generative Model for Music

    WT5?! Training Text-to-Text Models to Explain their Predictions

猜你喜欢

转载自blog.csdn.net/lqfarmer/article/details/106308091