WebCPM: Interactive Web Search for Chinese Long-form Question Answering
https://papers.cool/arxiv/2305.06849
Authors: Yujia Qin ; Zihan Cai ; Dian Jin ; Lan Yan ; Shihao Liang ; Kunlun Zhu ; Yankai Lin ; Xu Han ; Ning Ding ; Huadong Wang ; Ruobing Xie ; Fanchao Qi ; Zhiyuan Liu ; Maosong Sun ; Jie Zhou
Summary: Long-form question answering (LFQA) aims at answering complex, open-ended questions with detailed, paragraph-length responses. The de facto paradigm of LFQA necessitates two procedures: information retrieval, which searches for relevant supporting facts, and information synthesis, which integrates these facts into a coherent answer. In this paper, we introduce WebCPM, the first Chinese LFQA dataset. One unique feature of WebCPM is that its information retrieval is based on interactive web search, which engages with a search engine in real time. Following WebGPT, we develop a web search interface. We recruit annotators to search for relevant information using our interface and then answer questions. Meanwhile, the web search behaviors of our annotators would be recorded. In total, we collect 5,500 high-quality question-answer pairs, together with 14,315 supporting facts and 121,330 web search actions. We fine-tune pre-trained language models to imitate human behaviors for web search and to generate answers based on the collected facts. Our LFQA pipeline, built on these fine-tuned models, generates answers that are no worse than human-written ones in 32.5% and 47.5% of the cases on our dataset and DuReader, respectively.



Q: 这篇论文试图解决什么问题?
A: 这篇论文试图解决的问题是长形式问答(Long-form Question Answering, LFQA)中的信息检索和信息合成问题。具体来说,它关注于如何通过交互式网络搜索来获取相关信息,并基于这些信息合成详细、连贯的答案。传统的LFQA方法通常依赖于非交互式的信息检索方法,这些方法使用原始问题作为查询来检索大量未经筛选的信息。与之相反,人类在面对复杂问题时,能够进行实时的交互式网络搜索,通过分解问题、顺序提问和迭代搜索来提高对话题的理解,并优化搜索结果。论文中提出的WebCPM(Web-based Chinese Pre-trained Models)是第一个结合交互式网络搜索的中文LFQA数据集,旨在通过模仿人类在网络搜索中的行为来生成与人类编写的答案一样好或更好的答案。
Q: 有哪些相关研究?
A: 相关研究主要集中在以下几个方面:
这些研究为WebCPM的开发提供了背景和对比,同时也指出了交互式网络搜索在长形式问答中的潜力和挑战。WebCPM旨在通过提供一个公共的数据集和平台,促进这一领域的进一步研究。
Q: 论文如何解决这个问题?
A: 论文通过以下步骤解决了长形式问答(LFQA)中的交互式网络搜索问题: