Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation
https://papers.cool/arxiv/2307.11019
Authors: Ruiyang Ren ; Yuhao Wang ; Yingqi Qu ; Wayne Xin Zhao ; Jing Liu ; Hao Tian ; Hua Wu ; Ji-Rong Wen ; Haifeng Wang
Summary: Knowledge-intensive tasks (e.g., open-domain question answering (QA)) require a substantial amount of factual knowledge and often rely on external information for assistance. Recently, large language models (LLMs) (e.g., ChatGPT), have demonstrated impressive prowess in solving a wide range of tasks with world knowledge, including knowledge-intensive tasks. However, it remains unclear how well LLMs are able to perceive their factual knowledge boundaries, particularly how they behave when incorporating retrieval augmentation. In this study, we present an initial analysis of the factual knowledge boundaries of LLMs and how retrieval augmentation affects LLMs on open-domain QA. Specially, we focus on three primary research questions and analyze them by examining QA performance, priori judgement and posteriori judgement of LLMs. We show evidence that LLMs possess unwavering confidence in their capabilities to respond to questions and the accuracy of their responses. Furthermore, retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries, thereby improving their judgemental abilities. Additionally, we also find that LLMs have a propensity to rely on the provided retrieval results when formulating answers, while the quality of these results significantly impacts their reliance. The code to reproduce this work is available at https://github.com/RUCAIBox/LLM-Knowledge-Boundary.
A: 这篇论文试图解决的问题是大型语言模型(LLMs)在开放域问答(QA)任务中对其事实知识边界的感知能力,尤其是在结合检索增强(retrieval augmentation)时的表现。具体来说,研究关注以下几个主要问题:
论文通过实验分析了LLMs在没有检索增强(正常设置)和有检索增强(检索增强设置)的情况下的QA性能、先验判断(priori judgement)和后验判断(posteriori judgement),并提出了一些关键发现。这些发现有助于理解LLMs在处理知识密集型任务时的局限性,并为如何通过检索增强来提高LLMs的性能提供了见解。
Q: 有哪些相关研究?
A: 这篇论文中提到了与大型语言模型(LLMs)在开放域问答(QA)任务中的知识边界感知相关的一些研究。以下是一些相关的研究工作: