!CIKM2020
-
【SMITH】(长文本匹配)Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matching
两层次的 dual Transformer架构,第一层建模句子块内文本,第二层由句子块建模文本表达。
提出预训练+微调的方法,其中预训练任务在BERT MLM的基础上加入sentence block mask language model。
-
Learning to Detect Relevant Contexts and Knowledge for Response Selection in Retrieval-based Dialogue Systems
-
Dual Head-wise Coattention Network for Machine Comprehension with Multiple-Choice Questions
-
Robust Retrievability based Document Selection for Relevance Feedback with Automatically Generated Query Variants(找不到)
-
Transformer Model Compression via Joint Structured Pruning and Knowledge Distillation(找不到)
-
TABLE: A Task-Adaptive BERT-based ListwisE Ranking Model for Document Retrieval(找不到)
-
CGTR: Convolution Graph Topology Representation for Document Ranking
we propose a GCN based method to encode global structure in contextual embeddings to improve ad-hoc document retrieval.
-
【DMIN】Deep Multi-Interest Network for Click-through Rate Prediction
Distant Supervision in BERT-based Adhoc Document Retrieval
Multi-Interest Network with Dynamic Routing for Recommendation at Tmall
-
Distant supervision in BERT-based Ad-hoc Document Retrieval
We find that direct transfer of relevance labels from documents to passage introduces label noise and it can not leverage information from large amount of unlabelled documents.
We show that our distantly supervised models(Qa-DocRank and QA-Full-DOCRANK) perform better than the baselines. Extracting relevant passages from the unlabelled documents (QA-Full-DOCRANK) helps in solving the scarcity problem.
Qa-DocRank:先用一个QA_model对passage进行打分,根据设定阈值得到query-passage的相关性label,再用这些labeled pair微调BERT模型。推断时是先得到每个passage的得分,再根据FirstP/MaxP/SumP/AvgP等策略得到最终得分。
QA-Full-DOCRANK:在Qa-DocRank的基础上,用一个QA_model对无标签的document的passage进行打分,根据设定阈值得到query-passage的相关性label,再用这些labeled pair微调BERT模型。