PRCA | Notion

PRCA: Fitting Black-Box Large Language Models for Retrieval Question Answering via Pluggable Reward-Driven Contextual Adapter

https://papers.cool/arxiv/2310.18347

Authors: Haoyan Yang ; Zhitao Li ; Yong Zhang ; Jianzong Wang ; Ning Cheng ; Ming Li ; Jing Xiao

Summary: The Retrieval Question Answering (ReQA) task employs the retrieval-augmented framework, composed of a retriever and generator. The generator formulates the answer based on the documents retrieved by the retriever. Incorporating Large Language Models (LLMs) as generators is beneficial due to their advanced QA capabilities, but they are typically too large to be fine-tuned with budget constraints while some of them are only accessible via APIs. To tackle this issue and further improve ReQA performance, we propose a trainable Pluggable Reward-Driven Contextual Adapter (PRCA), keeping the generator as a black box. Positioned between the retriever and generator in a Pluggable manner, PRCA refines the retrieved information by operating in a token-autoregressive strategy via maximizing rewards of the reinforcement learning phase. Our experiments validate PRCA's effectiveness in enhancing ReQA performance on three datasets by up to 20% improvement to fit black-box LLMs into existing frameworks, demonstrating its considerable potential in the LLMs era.

Q: 这篇论文试图解决什么问题？

A: 这篇论文试图解决的问题是如何在有限的计算预算下，有效地整合大型语言模型（LLMs）作为生成器，以提高检索式问答（ReQA）任务的性能。具体来说，论文面临的挑战包括：

大型语言模型的微调困难：由于LLMs通常拥有庞大的参数量，它们在有限的计算资源下难以进行微调。
模型的封闭性：某些LLMs（如GPT-4）是封闭源代码的，这意味着无法对它们进行微调。
API访问限制：对于只能通过API访问的LLMs，可能无法获取到用于计算损失函数的内部信息，如最终层的logits。
输入长度的限制：LLMs在处理长文本时可能会遇到效率和准确性的问题，尤其是在需要处理大量检索到的文档时。

为了解决这些问题，论文提出了一个可训练的、可插拔的奖励驱动上下文适配器（Pluggable Reward-Driven Contextual Adapter, PRCA），它能够在不改变检索器和生成器的情况下，通过最大化强化学习阶段的奖励来精炼检索到的信息，从而提高ReQA任务的性能。

Q: 有哪些相关研究？

A: 相关研究主要集中在以下几个方面：

大型语言模型（LLMs）作为黑盒模型的潜力：研究表明，LLMs在下游问答任务中表现出色，即使在有限或没有训练数据的情况下。这些模型的非开源性质和庞大的参数规模使它们倾向于被视为黑盒模型。例如，GPT-4和PaLM在问答任务中展示了出色的性能，但由于它们的封闭源代码，只能通过API访问，因此被视为黑盒模型。
检索增强框架：为了提高ReQA任务的性能，研究者们开发并应用了多种检索增强方法。早期研究使用了基于统计相似性的检索器，如TF-IDF和BM25。随后引入了向量化方法，将问题和文档表示为向量，并使用向量相似性作为检索的关键参数。此外，还有基于对比学习的方法，如SimCSE和Contriver，以及句子级语义模型，如Sentence-BERT。
检索增强框架的进一步发展：研究者们还探索了将检索和生成组件融合到ReQA框架中的方法。例如，REALM和RAG系统通过共同训练检索器和生成器来提高性能。最近，Atlas和RETRO等先进方法被提出，它们在参数数量显著减少的情况下，能够实现与大规模模型如Palm和GPT3相当的性能。

这些相关研究为本文提出的PRCA方法提供了背景和基础，特别是在如何有效地利用LLMs以及如何改进检索增强框架以提高问答任务性能方面。