Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions
https://papers.cool/arxiv/2212.10509
Authors: Harsh Trivedi ; Niranjan Balasubramanian ; Tushar Khot ; Ashish Sabharwal
Summary: Prompting-based large language models (LLMs) are surprisingly powerful at generating natural language reasoning steps or Chains-of-Thoughts (CoT) for multi-step question answering (QA). They struggle, however, when the necessary knowledge is either unavailable to the LLM or not up-to-date within its parameters. While using the question to retrieve relevant text from an external knowledge source helps LLMs, we observe that this one-step retrieve-and-read approach is insufficient for multi-step QA. Here, \textit{what to retrieve} depends on \textit{what has already been derived}, which in turn may depend on \textit{what was previously retrieved}. To address this, we propose IRCoT, a new approach for multi-step QA that interleaves retrieval with steps (sentences) in a CoT, guiding the retrieval with CoT and in turn using retrieved results to improve CoT. Using IRCoT with GPT3 substantially improves retrieval (up to 21 points) as well as downstream QA (up to 15 points) on four datasets: HotpotQA, 2WikiMultihopQA, MuSiQue, and IIRC. We observe similar substantial gains in out-of-distribution (OOD) settings as well as with much smaller models such as Flan-T5-large without additional training. IRCoT reduces model hallucination, resulting in factually more accurate CoT reasoning. Code, data, and prompts are available at \url{https://github.com/stonybrooknlp/ircot}
A: 这篇论文试图解决的问题是如何提高大型语言模型(LLMs)在知识密集型多步问答(QA)任务中的性能。具体来说,它关注以下几个挑战:
为了解决这些问题,论文提出了一种新的方法,称为IRCoT(Interleaved Retrieval guided by Chain-of-Thought),它通过将检索与CoT(Chain-of-Thought)推理步骤交织在一起,相互指导,以提高检索质量和下游QA性能。这种方法使用CoT来指导检索,反过来使用检索结果来改进CoT推理,从而在多个数据集上显著提高了检索和QA性能。
A: 论文中提到了与IRCoT相关的几个研究领域和具体工作,主要包括: