RA-DIT | Notion

RA-DIT: Retrieval-Augmented Dual Instruction Tuning

Authors: Xi Victoria Lin ; Xilun Chen ; Mingda Chen ; Weijia Shi ; Maria Lomeli ; Rich James ; Pedro Rodriguez ; Jacob Kahn ; Gergely Szilvasy ; Mike Lewis ; Luke Zettlemoyer ; Scott Yih

Summary: Retrieval-augmented language models (RALMs) improve performance by accessing long-tail and up-to-date knowledge from external data stores, but are challenging to build. Existing approaches require either expensive retrieval-specific modifications to LM pre-training or use post-hoc integration of the data store that leads to suboptimal performance. We introduce Retrieval-Augmented Dual Instruction Tuning (RA-DIT), a lightweight fine-tuning methodology that provides a third option by retrofitting any LLM with retrieval capabilities. Our approach operates in two distinct fine-tuning steps: (1) one updates a pre-trained LM to better use retrieved information, while (2) the other updates the retriever to return more relevant results, as preferred by the LM. By fine-tuning over tasks that require both knowledge utilization and contextual awareness, we demonstrate that each stage yields significant performance improvements, and using both leads to additional gains. Our best model, RA-DIT 65B, achieves state-of-the-art performance across a range of knowledge-intensive zero- and few-shot learning benchmarks, significantly outperforming existing in-context RALM approaches by up to +8.9% in 0-shot setting and +1.4% in 5-shot setting on average.

Q: 这篇论文试图解决什么问题？

A: 这篇论文试图解决的问题是如何有效地构建和改进检索增强型语言模型（Retrieval-Augmented Language Models, RALMs）。具体来说，论文针对以下挑战：

现有方法的局限性：现有的RALMs构建方法要么需要对语言模型（LM）进行昂贵的检索特定修改，要么采用事后集成数据存储的方式，这导致了次优的性能。
知识利用与上下文意识：大型语言模型（LLMs）在零样本（zero-shot）和少样本（few-shot）学习任务中表现出色，但它们在捕获长尾知识（long-tail knowledge）和保持知识更新方面存在困难。
检索增强语言模型的集成：为了克服这些限制，论文提出了一种名为Retrieval-Augmented Dual Instruction Tuning（RA-DIT）的方法，该方法通过轻量级的微调过程，为任何大型语言模型（LLM）提供检索能力。
性能提升：通过在两个不同的微调步骤中分别更新预训练的LM和检索器，RA-DIT旨在提高模型在知识密集型任务上的性能，特别是在需要利用外部知识源的场景中。
与现有方法的比较：论文展示了RA-DIT在多个知识密集型基准测试中的性能，与现有的上下文RALM方法相比，取得了显著的性能提升。

总的来说，这篇论文的目标是提出一种有效的方法来增强大型语言模型的检索能力，使其能够在各种知识密集型任务中更好地利用外部知识，同时避免现有方法的高计算成本和性能限制。

Q: 有哪些相关研究？

A: 这篇论文提到了多个与检索增强型语言模型（RALMs）相关的研究，以下是一些关键的相关研究：

REALM (Guu et al., 2020)：提出了一种端到端的预训练方法，将检索组件从一开始就集成到语言模型中。
RETRO (Borgeaud et al., 2022)：同样采用了端到端预训练的方法，通过持续地在无监督文本上进行预训练，以改进语言模型和检索器的集成。
ATLAS (Izacard et al., 2022b)：基于T5语言模型，通过持续预训练框架来改进检索增强语言模型。
REPLUG (Shi et al., 2023b)：结合了现成的大型语言模型和通用检索器，展示了即使在独立优化的情况下，通过LLMs的上下文学习能力，也可以有效地融合检索器。