Improving language models by retrieving from trillions of tokens
https://papers.cool/arxiv/2112.04426
Authors: Sebastian Borgeaud ; Arthur Mensch ; Jordan Hoffmann ; Trevor Cai ; Eliza Rutherford ; Katie Millican ; George van den Driessche ; Jean-Baptiste Lespiau ; Bogdan Damoc ; Aidan Clark ; Diego de Las Casas ; Aurelia Guy ; Jacob Menick ; Roman Ring ; Tom Hennigan ; Saffron Huang ; Loren Maggiore ; Chris Jones ; Albin Cassirer ; Andy Brock ; Michela Paganini ; Geoffrey Irving ; Oriol Vinyals ; Simon Osindero ; Karen Simonyan ; Jack W. Rae ; Erich Elsen ; Laurent Sifre
Summary: We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\times$ fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train RETRO from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale.
Q: 这篇论文试图解决什么问题?
A: 这篇论文提出了一种新的方法来增强自回归语言模型(LM),通过从大型语料库中检索文档片段来提高模型的性能。主要解决的问题是如何在不显著增加计算量的情况下,为语言模型提供更大规模的记忆能力。具体来说,论文的目标是:
总的来说,论文的核心贡献在于提出了一种新的半参数化方法,即Retro模型,它通过检索大量文本数据来增强语言模型的性能,同时保持了模型参数数量的相对较低。这种方法为构建更强大的语言模型提供了一种新的、更高效的途径。
Q: 有哪些相关研究?
A: 论文中提到了与检索增强语言模型(Retro)相关的几个研究领域和具体工作,主要包括: