FLARE | Notion

Active Retrieval Augmented Generation

Authors: Zhengbao Jiang ; Frank F. Xu ; Luyu Gao ; Zhiqing Sun ; Qian Liu ; Jane Dwivedi-Yu ; Yiming Yang ; Jamie Callan ; Graham Neubig

Summary: Despite the remarkable ability of large language models (LMs) to comprehend and generate language, they have a tendency to hallucinate and create factually inaccurate output. Augmenting LMs by retrieving information from external knowledge resources is one promising solution. Most existing retrieval augmented LMs employ a retrieve-and-generate setup that only retrieves information once based on the input. This is limiting, however, in more general scenarios involving generation of long texts, where continually gathering information throughout generation is essential. In this work, we provide a generalized view of active retrieval augmented generation, methods that actively decide when and what to retrieve across the course of the generation. We propose Forward-Looking Active REtrieval augmented generation (FLARE), a generic method which iteratively uses a prediction of the upcoming sentence to anticipate future content, which is then utilized as a query to retrieve relevant documents to regenerate the sentence if it contains low-confidence tokens. We test FLARE along with baselines comprehensively over 4 long-form knowledge-intensive generation tasks/datasets. FLARE achieves superior or competitive performance on all tasks, demonstrating the effectiveness of our method. Code and datasets are available at https://github.com/jzbjyb/FLARE.

Q: 这篇论文试图解决什么问题？

A: 这篇论文试图解决的问题是大型语言模型（LMs）在理解和生成语言时存在的幻觉（hallucination）问题，即它们倾向于创造事实上不准确的内容。为了解决这个问题，作者提出了一种增强LMs的方法，即通过从外部知识资源中检索信息来辅助生成过程。具体来说，论文提出了一种名为前瞻性主动检索增强生成（Forward-Looking Active REtrieval augmented generation，简称FLARE）的方法，该方法在生成过程中主动决定何时以及检索什么信息。FLARE通过迭代地使用即将生成的句子的预测来预测未来内容，并将其用作查询以检索相关文档，以便在生成包含低置信度标记的句子时重新生成该句子。这种方法旨在提高长文本生成任务中的语言模型性能，特别是在需要持续收集信息的场景中。

Q: 有哪些相关研究？

A: 相关研究主要集中在以下几个方面：

检索增强的语言模型（Retrieval-Augmented Language Models）：
- 这类研究通常关注如何将检索组件与语言模型结合，以提高模型在特定任务上的性能，尤其是在需要外部知识的情况下。例如，Chen et al. (2017), Guu et al. (2020), Lewis et al. (2020) 等。
多跳问答（Multi-hop Question Answering）：
- 多跳问答任务要求模型通过检索和推理来回答问题，这通常涉及到复杂的信息检索和处理。例如，Kwiatkowski et al. (2019), Joshi et al. (2017) 等。
开放域摘要（Open-Domain Summarization）：
- 开放域摘要任务要求模型从大量非结构化数据中提取信息并生成摘要。这通常需要模型具备检索和理解外部知识的能力。例如，Cohen et al. (2021), Hayashi et al. (2021) 等。
链式推理（Chain-of-Thought Reasoning）：
- 链式推理是一种推理方法，它通过逐步构建推理链来解决问题。这种方法在处理复杂问题时特别有用。例如，Wei et al. (2022), Ho et al. (2020) 等。
迭代和自适应检索（Iterative and Adaptive Retrieval）：
- 这类研究关注如何在生成过程中迭代地进行检索和更新，以提高生成内容的质量和相关性。例如，Peng et al. (2023), Zhang et al. (2023) 等。
浏览器增强的语言模型（Browser-Enhanced Language Models）：
- 这类模型通过与浏览器交互来增强其事实性，例如 WebGPT (Nakano et al., 2021) 和 WebCPM (Qin et al., 2023)。
大型语言模型的改进（Improving Large Language Models）：
- 研究如何通过外部知识和自动化反馈来改进大型语言模型。例如，Baolin Peng et al. (2023), Yury Zemlyanskiy et al. (2022) 等。

这些研究为FLARE方法提供了理论基础和技术背景，FLARE在此基础上进一步提出了一种新的主动检索策略，以期在长文本生成任务中取得更好的性能。

Q: 这篇论文试图解决什么问题？

Q: 有哪些相关研究？

Q: 论文如何解决这个问题？