LLM-Knowledge-Boundary

Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation

Authors: Ruiyang Ren ; Yuhao Wang ; Yingqi Qu ; Wayne Xin Zhao ; Jing Liu ; Hao Tian ; Hua Wu ; Ji-Rong Wen ; Haifeng Wang

Summary: Knowledge-intensive tasks (e.g., open-domain question answering (QA)) require a substantial amount of factual knowledge and often rely on external information for assistance. Recently, large language models (LLMs) (e.g., ChatGPT), have demonstrated impressive prowess in solving a wide range of tasks with world knowledge, including knowledge-intensive tasks. However, it remains unclear how well LLMs are able to perceive their factual knowledge boundaries, particularly how they behave when incorporating retrieval augmentation. In this study, we present an initial analysis of the factual knowledge boundaries of LLMs and how retrieval augmentation affects LLMs on open-domain QA. Specially, we focus on three primary research questions and analyze them by examining QA performance, priori judgement and posteriori judgement of LLMs. We show evidence that LLMs possess unwavering confidence in their capabilities to respond to questions and the accuracy of their responses. Furthermore, retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries, thereby improving their judgemental abilities. Additionally, we also find that LLMs have a propensity to rely on the provided retrieval results when formulating answers, while the quality of these results significantly impacts their reliance. The code to reproduce this work is available at https://github.com/RUCAIBox/LLM-Knowledge-Boundary.

Q: 这篇论文试图解决什么问题？

A: 这篇论文试图解决的问题是大型语言模型（LLMs）在开放域问答（QA）任务中对其事实知识边界的感知能力，尤其是在结合检索增强（retrieval augmentation）时的表现。具体来说，研究关注以下几个主要问题：

LLMs对事实知识边界的感知程度：研究LLMs在多大程度上能够认识到自己在回答问题时的知识限制。
检索增强对LLMs的影响：分析检索增强如何影响LLMs在开放域QA任务中的表现，包括它们对问题的回答能力和对自己回答正确性的判断能力。
不同特征的支持文档对LLMs的影响：探讨具有不同特征的支持文档（如相关性、是否包含正确答案等）如何影响LLMs在生成答案时的表现和对知识边界的感知。

论文通过实验分析了LLMs在没有检索增强（正常设置）和有检索增强（检索增强设置）的情况下的QA性能、先验判断（priori judgement）和后验判断（posteriori judgement），并提出了一些关键发现。这些发现有助于理解LLMs在处理知识密集型任务时的局限性，并为如何通过检索增强来提高LLMs的性能提供了见解。

Q: 有哪些相关研究？

A: 这篇论文中提到了与大型语言模型（LLMs）在开放域问答（QA）任务中的知识边界感知相关的一些研究。以下是一些相关的研究工作：

Petroni et al. (2021): 提出了KILT（Knowledge Intensive Language Tasks）基准，用于评估模型在知识密集型语言任务上的表现。
Chen et al. (2017): 通过阅读Wikipedia来回答问题，展示了如何利用外部文本语料库来解决开放域QA任务。
Devlin et al. (2019): 提出了BERT（Bidirectional Encoder Representations from Transformers），这是一种预训练语言模型，推动了自然语言处理的进步。
Lewis et al. (2020): 提出了BART（Bidirectional and Auto-Regressive Transformers），这是一种用于自然语言生成、翻译和理解的去噪序列到序列预训练模型。
Raffel et al. (2020): 探索了统一文本到文本转换器的迁移学习极限。
Karpukhin et al. (2020): 提出了密集段落检索（Dense Passage Retrieval）用于开放域QA，展示了如何通过检索增强来提高模型的性能。
Gao and Callan (2021): 提出了一种无监督的语料库感知语言模型预训练方法，用于密集段落检索。