Making Retrieval-Augmented Language Models Robust to Irrelevant Context
https://papers.cool/arxiv/2310.01558
Authors: Ori Yoran ; Tomer Wolfson ; Ori Ram ; Jonathan Berant
Summary: Retrieval-augmented language models (RALMs) hold promise to produce language understanding systems that are are factual, efficient, and up-to-date. An important desideratum of RALMs, is that retrieved information helps model performance when it is relevant, and does not harm performance when it is not. This is particularly important in multi-hop reasoning scenarios, where misuse of irrelevant evidence can lead to cascading errors. However, recent work has shown that retrieval augmentation can sometimes have a negative effect on performance. In this work, we present a thorough analysis on five open-domain question answering benchmarks, characterizing cases when retrieval reduces accuracy. We then propose two methods to mitigate this issue. First, a simple baseline that filters out retrieved passages that do not entail question-answer pairs according to a natural language inference (NLI) model. This is effective in preventing performance reduction, but at a cost of also discarding relevant passages. Thus, we propose a method for automatically generating data to fine-tune the language model to properly leverage retrieved passages, using a mix of relevant and irrelevant contexts at training time. We empirically show that even 1,000 examples suffice to train the model to be robust to irrelevant contexts while maintaining high performance on examples with relevant ones.
A: 这篇论文试图解决的问题是提高检索增强型语言模型(Retrieval-Augmented Language Models, RALMs)在面对不相关上下文时的鲁棒性。具体来说,论文的目标是确保在相关时检索到的信息能够提升模型的性能,而在不相关时不会损害性能。这在多跳推理场景中尤为重要,因为不恰当地使用不相关证据可能导致级联错误。论文通过在五个开放域问答基准测试上的全面分析,识别了检索降低准确性的情况,并提出了两种方法来缓解这个问题。
A: 相关研究包括但不限于以下几个方面:
这些研究为理解RALMs在处理不相关上下文时的挑战提供了背景,并为本文提出的解决方案提供了理论基础和技术支持。