Non-CoT | Notion

Chain-of-Thought Reasoning Without Prompting

Authors: Xuezhi Wang ; Denny Zhou

In enhancing the reasoning capabilities of large language models (LLMs), prior research primarily focuses on specific prompting techniques such as few-shot or zero-shot chain-of-thought (CoT) prompting. These methods, while effective, often involve manually intensive prompt engineering. Our study takes a novel approach by asking: Can LLMs reason effectively without prompting? Our findings reveal that, intriguingly, CoT reasoning paths can be elicited from pre-trained LLMs by simply altering the \textit{decoding} process. Rather than conventional greedy decoding, we investigate the top-$k$ alternative tokens, uncovering that CoT paths are frequently inherent in these sequences. This approach not only bypasses the confounders of prompting but also allows us to assess the LLMs' \textit{intrinsic} reasoning abilities. Moreover, we observe that the presence of a CoT in the decoding path correlates with a higher confidence in the model's decoded answer. This confidence metric effectively differentiates between CoT and non-CoT paths. Extensive empirical studies on various reasoning benchmarks show that the proposed CoT-decoding substantially outperforms the standard greedy decoding.

Q : 这篇论文试图解决什么问题？

A : 这篇论文试图解决的问题是如何在不使用特定提示（prompting）技术的情况下，有效地从大型语言模型（LLMs）中激发出链式思考（Chain-of-Thought, CoT）推理路径。具体来说，研究者们探讨了以下几个关键问题：

LLMs是否能够在没有提示的情况下进行有效推理？ 研究者们发现，通过简单地改变解码过程，可以从预训练的LLMs中激发出CoT推理路径，而不需要依赖于传统的贪婪解码（greedy decoding）。
LLMs的内在推理能力如何评估？ 通过观察模型在解码过程中的CoT路径，研究者们提出了一种新的方法来评估LLMs的内在推理能力，这种方法避免了提示引入的混淆因素。
CoT路径的存在与模型解码答案的置信度有何关联？ 研究者们观察到，当解码路径中存在CoT时，模型对其解码答案的置信度通常会更高。这种置信度指标有效地区分了CoT和非CoT路径。
CoT-decoding方法在各种推理基准测试中的表现如何？ 通过广泛的实证研究，研究者们展示了所提出的CoT-decoding方法在多种推理基准测试中显著优于标准的贪婪解码。

总的来说，这篇论文的核心贡献在于提出了一种新的视角来探索和利用LLMs的推理能力，即通过改变解码策略来自然地揭示模型的推理路径，而不是依赖于外部的提示或训练。这种方法不仅提高了模型在各种推理任务上的性能，而且为理解模型的内在推理机制提供了新的见解。

Q ：有哪些相关研究？

A : 这篇论文提到了多个与大型语言模型（LLMs）推理能力相关的研究，以下是一些关键的相关研究：

链式思考（Chain-of-Thought， CoT）推理 ：
- Kojima et al. (2022) 提出了零样本（zero-shot）CoT提示方法，通过特定的指令来引导模型展示中间步骤。
- Yasunaga et al. (2023) 也探讨了零样本CoT提示在推理任务中的应用。
- Wei et al. (2022) 研究了在数学问题解决中，通过少量样本（few-shot）CoT提示来提高模型性能。
模型训练和指令调整（Instruction Tuning） ：
- Chung et al. (2022) 和 Cobbe et al. (2021b) 通过在大量CoT推理数据上进行模型训练或指令调整，来提高模型的推理能力。
解码算法 ：
- Wang et al. (2023a) 提出了一种基于自洽性（self-consistency）的解码方法，通过聚合多个生成的响应来改善推理性能。
- Li et al. (2023a) 提出了对比解码（contrastive decoding），通过惩罚小型模型的logits来提高生成质量。
模型理解和知识发现 ：
- McCoy et al. (2023) 和 Prystawski et al. (2023) 研究了模型在面对不同任务时，如何从经验中产生推理。
- Razeghi et al. (2022) 分析了预训练数据的分布如何影响模型在少量样本推理任务中的表现。
模型调整和代理调优（Proxy Tuning）：
- Liu et al. (2024) 提出了一种通过代理模型来调整大型语言模型的方法，以提高其在推理任务上的性能。

这些研究为理解LLMs的推理能力、改进模型的推理性能以及探索模型的内在机制提供了丰富的理论和实践基础。论文中提出的CoT-decoding方法在这些研究的基础上，进一步探索了如何通过改变解码策略来激发和利用模型的内在推理能力。