Retrieve-and-Sample

Retrieve-and-Sample: Document-level Event Argument Extraction via Hybrid Retrieval Augmentation

https://papers.cool/venue/2023.acl-long.17@ACL

Authors: Yubing Ren ; Yanan Cao ; Ping Guo ; Fang Fang ; Wei Ma ; Zheng Lin

Summary: Recent studies have shown the effectiveness of retrieval augmentation in many generative NLP tasks. These retrieval-augmented methods allow models to explicitly acquire prior external knowledge in a non-parametric manner and regard the retrieved reference instances as cues to augment text generation. These methods use similarity-based retrieval, which is based on a simple hypothesis: the more the retrieved demonstration resembles the original input, the more likely the demonstration label resembles the input label. However, due to the complexity of event labels and sparsity of event arguments, this hypothesis does not always hold in document-level EAE. This raises an interesting question: How do we design the retrieval strategy for document-level EAE? We investigate various retrieval settings from the input and label distribution views in this paper. We further augment document-level EAE with pseudo demonstrations sampled from event semantic regions that can cover adequate alternatives in the same context and event schema. Through extensive experiments on RAMS and WikiEvents, we demonstrate the validity of our newly introduced retrieval-augmented methods and analyze why they work.

Q: 这篇论文试图解决什么问题？

A: 这篇论文试图解决的问题是文档级事件论元提取（Document-level Event Argument Extraction, EAE）的挑战。文档级EAE的目标是从整个文档中提取信息丰富的事件核，并从中提取出事件论元，并将它们分配给正确的角色。这项任务对于信息检索、问答系统和事件图推理等下游应用至关重要。然而，由于事件标签的复杂性和事件论元的稀疏性，传统的基于相似性的检索方法在文档级EAE中并不总是有效。论文提出了一种名为“Retrieve-and-Sample”的方法，通过混合检索增强来改进文档级EAE。

具体来说，论文探讨了以下几个关键问题：

如何设计适用于文档级EAE的检索策略，以便模型能够通过检索到的参考实例（demonstrations）来增强文本生成过程。
如何通过检索增强方法来提高模型在文档级EAE任务中的性能，特别是在处理复杂事件模式和稀疏事件论元时。
如何利用连续空间中的伪示例（pseudo demonstrations）来作为深度线索，以提高模型的类比能力，从而更有效地解决文档级EAE任务。

论文通过在RAMS和WikiEvents数据集上的广泛实验，验证了所提出的检索增强方法的有效性，并分析了这些方法为何有效。

Q: 有哪些相关研究？

A: 相关研究主要集中在文档级事件论元提取（Document-level Event Argument Extraction, EAE）和检索增强文本生成（Retrieval-Augmented Text Generation, RAG）两个领域。以下是一些与这篇论文相关的研究工作：

文档级EAE研究:
- 多标签分类方法: 如使用BERT-CRF模型进行事件论元的多标签分类，这种方法首先识别论元跨度，然后为每个跨度分配特定的角色。
- 问答（QA）方法: 如EEQA模型，它将文档级EAE任务形式化为问答任务，通过回答自然语言问题来提取事件论元。
- 生成方法: 如BART-Gen模型，它将EAE任务视为序列到序列的任务，使用BART模型生成预定义格式的对应论元。
检索增强文本生成（RAG）研究:
- 对话生成: 如Retrieve-and-Refine模型，它通过检索记忆来指导对话生成。
- 机器翻译: 如使用相似翻译片段来增强神经机器翻译模型。
- 信息提取: 如使用开放域知识（如维基百科）作为检索源，用于远程监督关系提取。
特定于EAE的检索增强方法:
- 命名实体识别（NER）: 如使用检索记忆来指导低资源环境下的NER任务。
- 关系提取: 如将关系提取任务视为开卷考试，通过检索增强的提示调整来提高性能。

这些研究展示了在不同的自然语言处理（NLP）任务中，如何利用检索增强方法来提高模型的性能，特别是在处理复杂和稀疏数据时。这篇论文通过提出新的检索策略和混合检索增强方法，进一步推动了这一领域的研究。