cyq's blog

（效率）DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering（解耦低层的BERT）

（效率）DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

本文提出假设：对有些样本而言，不需要通过BERT/RoBERT的所有层就可以得到正确的预测结果，即有些层是冗余的。基于此假设，本文提出的方法是在预训练的BERT/RoBERT的基础上，在每层Transformer后面加一个分类层，然后进行两阶段的参数微调，最后在推断阶段时，如果哪层的分类结果达到置信值，就提前退出。实验部分在GLUE的所有task上进行评估，可以在性能稍降的基础上明显提高效率。

（效率、蒸馏）FastBERT: a Self-distilling BERT with Adaptive Inference Time

整体模型结构和motivation类似DeeBERT，只是在对每个分类层的训练方法上不太一样，本文使用self-distilling的方式，可以利用无限的unlabel数据，最后来看效果会比DeeBERT更好。实验在6个中文数据集和6个英文数据集上进行，可以在性能几乎不降的情况下3-5倍增加效率。

（效率）HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

（效率、模型压缩）MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices

（效率、稀疏表示）Contextualized Sparse Representations for Real-Time Open-Domain Question Answering

对DenSPI的稀疏表示部分的改进，增加可学习的稀疏表示，使用核函数思想。

A Mixture of h−1 Heads is Better than h Heads

Highway Transformer: Self-Gating Enhanced Self-Attentive Networks

对Transformer结构进行改进，加入类似highway的思想，从而加快收敛速度，同时取得更好的性能。主要的motivation是：Transformer中multi-head attention的结构只保留了词之间的关系，没有保留词本身的特征。

How Does Selective Mechanism Improve Self-Attention Networks?

Combining Subword Representations into Word-level Representations in the Transformer Architecture

（可解释性）Roles and Utilization of Attention Heads in Transformer-based Neural Language Models

1) we suggest an analysis method which helps understand where linguistic properties are learned and represented along attention heads in transformer architectures and 2) we show that using analysis results, attention heads can be maximally utilized for performance gains during the ﬁne-tuning process on the downstream tasks and for capturing linguistic properties.

（预训练）BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

主要是针对生成式任务（翻译、摘要、QA、对话），同时保证分类任务表现也不差。

论文中也对比了不同预训练目标函数和噪声干扰函数的影响。

BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance

？？schuBERT: Optimizing Elements of BERT

SenseBERT: Driving Some Sense into BERT

（可解释性）Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT

（预训练）Span Selection Pre-training for Question Answering

（表示学习）Attentive Pooling with Learnable Norms for Text Representation

we propose an Attentive Pooling with Learnable Norms (APLN) approach for text representation. we propose two methods to ensure the numerical stability of the model training: The ﬁrst one is scale limiting, which limits the scale of input representations to ensure their non-negativity and avoid potential exponential explosion. The second one is re-formulation, which decomposes the exponent operation into several safe atomic operations to avoid computing the real-valued powers of input features with less computational cost.

评估任务是文本（情感）分类。

【MRC】Document Modeling with Graph Attention Networks for Multi-grained Machine Reading Comprehension

We model documents at different levels of granularity to learn the hierarchical nature of the document. On the Natural Questions dataset, which contains two sub-tasks predicting a paragraph-level long answer and a token-level short answer, our method jointly trains the two sub-tasks to consider the dependencies of the twograined answers.

（表示学习）Fast and Accurate Deep Bidirectional Language Representations for Unsupervised Learning

we propose a new learning objective named language autoencoding (LAE) for obtaining fully contextualized language representations without repetition. To learn the proposed LAE, we develop a diagonal masking operation and an input isolation mechanism inside the T-TA.

QUASE: Question-Answer Driven Sentence Encoding

ReasoningOver Semantic-Level Graph for Fact Checking

The main contribution of this work is the graphbased reasoning approach for claim veriﬁcation.

（表示学习）SPECTER: Document-level Representation Learning using Citation-informed Transformers（利用学术文章之间的引用关系更好的学习学术文章的document表示）

（课程学习）Curriculum Learning for Natural Language Understanding

We explore and demonstrate the effectiveness of CL in the context of ﬁnetuning LM on NLU tasks. We propose a novel CL framework that consists of a Difﬁculty Review method and a Curriculum Arrangement algorithm.

（问题生成）How to Ask Good Questions? Try to Leverage Paraphrases

（问题生成）Semantic Graphs for Generating Deep Questions

Deep Questions Generation: 生成需要多文本片段推理的复杂问题。

we propose a novel framework which ﬁrst constructs a semantic-level graph for the input document and then encodes the semantic graph by introducing an attention-based GGNN (Att-GGNN). Afterwards, we fuse the document-level and graphlevel representations to perform joint training of content selection and question decoding.

（商品检索）Learning Robust Models for e-Commerce Product Search（通过匹配的query-item pair生成不匹配的query，以便学习更鲁棒的分类界面，更好地识别不匹配的query-item pair。）

（FAQ）Unsupervised FAQ Retrieval with Question Generation and BERT

We presented a fully unsupervised method for FAQ retrieval. The method is based on an initial retrieval of FAQ candidates followed by three rerankers. The ﬁrst one is based on an IR passage retrieval approach, and the others two are independent BERT models that are ﬁne-tuned to predict query-to-answer and query-to-question matching.

The Cascade Transformer: an Application for Efficient Answer Sentence Selection

This work introduces CT, a variant of the traditional transformer model designed to improve inference throughput. Our approach leverages classiﬁers placed at different encoding stages to prune candidates in a batch and improve model throughput.

（短文本匹配、图网络）Neural Graph Matching Networks for Chinese Short Text Matching

we propose a neural graph matching model for Chinese short text matching.

（语义匹配）tBERT: Topic Models and BERT Joining Forces for Semantic Similarity Detection

we proposed a ﬂexible framework for combining topic models with BERT.

（自然语言生成）A Generative Model for Joint Natural Language Understanding and Generation

个人杂谈之ACL2020

（完成）ACL2020（571 篇长文和 208 篇短文）

CATALOG