RA-Long-Form | Notion

Understanding Retrieval Augmentation for Long-Form Question Answering

Authors: Hung-Ting Chen ; Fangyuan Xu ; Shane Arora ; Eunsol Choi

Summary: We present a study of retrieval-augmented language models (LMs) on long-form question answering. We analyze how retrieval augmentation impacts different LMs, by comparing answers generated from models while using the same evidence documents, and how differing quality of retrieval document set impacts the answers generated from the same LM. We study various attributes of generated answers (e.g., fluency, length, variance) with an emphasis on the attribution of generated long-form answers to in-context evidence documents. We collect human annotations of answer attribution and evaluate methods for automatically judging attribution. Our study provides new insights on how retrieval augmentation impacts long, knowledge-rich text generation of LMs. We further identify attribution patterns for long text generation and analyze the main culprits of attribution errors. Together, our analysis reveals how retrieval augmentation impacts long knowledge-rich text generation and provide directions for future work.

Q: 这篇论文试图解决什么问题？

A: 这篇论文研究了检索增强型语言模型（LMs）在长篇幅问答（Long-Form Question Answering, LFQA）任务中的表现。主要解决了以下问题：

检索增强对不同LMs的影响：通过比较使用相同证据文档集生成的答案，分析检索增强如何影响不同语言模型的答案生成。
证据文档集质量对答案生成的影响：研究不同质量的检索文档集如何影响从同一LM生成的答案。
生成答案的属性分析：研究生成答案的各种属性（如流畅性、长度、变化性），并重点分析将生成的长篇幅答案归因于上下文中的证据文档。
答案归因的评估：通过收集人类对答案归因的注释，并评估自动判断归因的方法，提供了对检索增强如何影响LMs长篇幅、知识丰富文本生成的新见解。
归因模式的识别：识别长篇幅文本生成的归因模式，并分析归因错误的主要原因。
未来工作的方向：基于分析结果，提出未来研究的方向，包括如何改进LMs以更好地利用上下文中的证据文档，以及如何提高检索组件的性能。

总的来说，这篇论文旨在深入理解检索增强型语言模型在长篇幅问答任务中的表现，特别是在答案的生成和归因方面，以及如何通过改进模型和检索系统来提高答案的质量和可靠性。

Q: 有哪些相关研究？

A: 这篇论文提到了以下几项与检索增强型语言模型（LMs）和长篇幅问答（LFQA）相关的研究：

检索增强型语言模型：
- Nakano et al. (2021) 提出了检索作为向LMs提供最新、相关信息的强大工具。
- Ram et al. (2023) 和 Shi et al. (2023) 研究了检索增强生成在多个任务中的性能提升。
- Wang et al. (2023) 研究了检索在开放式文本生成中的影响。
长篇幅问答（LFQA）：
- Fan et al. (2019) 和 Stelmakh et al. (2022) 探讨了LFQA的挑战，这要求模型生成段落长度的答案来回应复杂、开放式的问题。
- Krishna et al. (2021) 提出了检索增强模型在生成过程中往往忽略了检索到的文档。
答案归因：
- Rashkin et al. (2021) 提出了AIS框架，用于评估系统生成的文本是否可以从给定的知识源中推导出来。
- Bohnet et al. (2022) 和 Yue et al. (2023) 研究了自动评估归因的方法，前者使用蕴含模型，后者通过提示LLMs。