WEBCPM | Notion

WebCPM: Interactive Web Search for Chinese Long-form Question Answering

Authors: Yujia Qin ; Zihan Cai ; Dian Jin ; Lan Yan ; Shihao Liang ; Kunlun Zhu ; Yankai Lin ; Xu Han ; Ning Ding ; Huadong Wang ; Ruobing Xie ; Fanchao Qi ; Zhiyuan Liu ; Maosong Sun ; Jie Zhou

Summary: Long-form question answering (LFQA) aims at answering complex, open-ended questions with detailed, paragraph-length responses. The de facto paradigm of LFQA necessitates two procedures: information retrieval, which searches for relevant supporting facts, and information synthesis, which integrates these facts into a coherent answer. In this paper, we introduce WebCPM, the first Chinese LFQA dataset. One unique feature of WebCPM is that its information retrieval is based on interactive web search, which engages with a search engine in real time. Following WebGPT, we develop a web search interface. We recruit annotators to search for relevant information using our interface and then answer questions. Meanwhile, the web search behaviors of our annotators would be recorded. In total, we collect 5,500 high-quality question-answer pairs, together with 14,315 supporting facts and 121,330 web search actions. We fine-tune pre-trained language models to imitate human behaviors for web search and to generate answers based on the collected facts. Our LFQA pipeline, built on these fine-tuned models, generates answers that are no worse than human-written ones in 32.5% and 47.5% of the cases on our dataset and DuReader, respectively.

Untitled

Q: 这篇论文试图解决什么问题？

A: 这篇论文试图解决的问题是长形式问答（Long-form Question Answering, LFQA）中的信息检索和信息合成问题。具体来说，它关注于如何通过交互式网络搜索来获取相关信息，并基于这些信息合成详细、连贯的答案。传统的LFQA方法通常依赖于非交互式的信息检索方法，这些方法使用原始问题作为查询来检索大量未经筛选的信息。与之相反，人类在面对复杂问题时，能够进行实时的交互式网络搜索，通过分解问题、顺序提问和迭代搜索来提高对话题的理解，并优化搜索结果。论文中提出的WebCPM（Web-based Chinese Pre-trained Models）是第一个结合交互式网络搜索的中文LFQA数据集，旨在通过模仿人类在网络搜索中的行为来生成与人类编写的答案一样好或更好的答案。

Q: 有哪些相关研究？

A: 相关研究主要集中在以下几个方面：

检索与合成在长形式问答（LFQA）中的应用：以往的研究通常依赖于本地知识库（如维基百科）进行信息检索。最近，研究者开始关注利用整个网络作为知识源，这不仅扩大了信息来源的范围，还实现了对最新知识的实时覆盖。然而，如何将检索到的事实结构化成一个合理且细致的长形式答案仍然是一个未充分探索的问题。
WebGPT：WebGPT是支持交互式网络搜索的先驱工作之一。它构建了一个基于Microsoft Bing的网络搜索界面，招募标注员使用该界面收集信息以回答问题，然后微调GPT-3模型以模仿人类网络搜索行为并组织收集的信息生成答案。尽管WebGPT在LFQA中表现出色，但由于其界面、数据集和训练模型未公开，社区难以理解交互式网络搜索的挑战并继续探索这一研究方向。
工具学习（Tool Learning）：最近的研究表明，预训练语言模型（PLMs）具有操纵工具的潜力，即工具学习。PLMs可以在复杂的交互环境中进行顺序决策，例如在机器人任务中进行规划、操纵搜索引擎、在电子商务网站上购物等。通过利用预训练过程中学到的丰富世界知识，PLMs可以执行与现实世界交互的有根据的动作。
其他相关数据集：如CMRC、C3、GopherCite等，这些数据集在不同程度上涉及了长形式问答任务，但它们在语言、是否公开、是否支持自由形式答案、是否包含网络搜索行为以及问题、支持事实和答案的平均长度等方面与WebCPM有所不同。

这些研究为WebCPM的开发提供了背景和对比，同时也指出了交互式网络搜索在长形式问答中的潜力和挑战。WebCPM旨在通过提供一个公共的数据集和平台，促进这一领域的进一步研究。

Q: 论文如何解决这个问题？

A: 论文通过以下步骤解决了长形式问答（LFQA）中的交互式网络搜索问题：