LLMs之CoD：《Chain of Draft: Thinking Faster by Writing Less》翻译与解读

导读：这篇论文的核心是提出了一种名为“Chain of Draft”（CoD，草稿链）的新型提示策略，用于改进大型语言模型（LLMs）的推理能力，并解决现有方法的效率问题。核心是提出了一种新的、更高效的 LLM 推理方法 CoD，它通过模仿人类的简略思考方式，在保证准确率的同时，大幅降低了推理的成本和延迟，为 LLM 的实际应用提供了新的思路。但是，论文也指出了 CoD 在零样本场景和小型模型上的局限性，这为未来的研究方向提供了指引。

>> 背景痛点：现有的大型语言模型推理方法，例如“Chain of Thought”（CoT，思维链），虽然在复杂推理任务中取得了显著成果，但其步骤冗长，生成大量文本，导致推理过程计算成本高、延迟大，不适用于对效率要求高的实际应用场景。这与人类高效的简略思考和记录关键信息的过程形成鲜明对比。

>> 具体的解决方案：提出Chain of Draft (CoD)。CoD 是一种新型的提示策略，它模仿人类在解决问题时只记录关键信息的思维方式。与 CoT 不同，CoD 鼓励 LLMs 在每个推理步骤中生成简洁、信息密集的输出，从而减少冗余信息。 CoD 的核心在于将冗长的推理步骤浓缩成简短的“草稿”，只保留关键信息和计算结果。

>> 核心思路步骤：CoD 的核心思想是将 CoT 中冗长的推理步骤简化成更简洁的表达。它通过提示词引导模型进行分步推理，但要求每一步推理的输出都尽可能简短（例如，限制在五个单词以内）。这使得模型能够专注于解决问题，而不是生成大量的无关信息。

>> 优势：

● 显著降低延迟和成本：CoD在保持或提高准确率的同时，大幅减少了 token 使用量和推理时间。实验结果表明，CoD的 token 使用量仅为 CoT 的 7.6%，延迟也大幅降低。

● 保持或提高准确率：在算术推理、常识推理和符号推理等多种任务上，CoD 的准确率与 CoT 相当甚至更好。

● 提高效率：CoD 的简洁性使其更适合于资源受限的实际应用场景。

>> 论文结论和观点：

● CoD 是一种有效的 LLM 推理策略，它在保持或提高准确率的同时，显著降低了计算成本和延迟。

● CoD 的简洁性使其更适用于实际应用场景，尤其是在资源受限的环境中。

● CoD 的成功表明，有效的 LLM 推理并不一定需要冗长的输出。

● 未来的研究可以探索将 CoD 与其他降低延迟的方法结合起来，进一步优化性能。

● CoD 的理念可以启发新的 LLM 设计策略，例如使用简洁的推理数据进行训练。

● CoD 在零样本设置和小型模型上的表现较差，这表明需要进一步研究如何改进 CoD 在这些场景下的性能，例如通过使用 CoD 格式的数据进行微调。

《Chain of Draft: Thinking Faster by Writing Less》翻译与解读

Abstract

1、Introduction

Figure 1:Comparison of Claude 3.5 Sonnet’s accuracy and token usage across different tasks with three different prompt strategies: direct answer (Standard), Chain of Thought (CoT), and Chain of Draft (CoD). CoD achieves similar accuracy as CoT while using significant fewer tokens.图 1：在三种不同的提示策略（直接回答（标准）、思维链（CoT）和草稿链（CoD））下，Claude 3.5 生成十四行诗在不同任务中的准确性和标记使用情况对比。CoD 达到了与 CoT 相似的准确率，但使用的标记数量显著更少。

Discussion

《Chain of Draft: Thinking Faster by Writing Less》翻译与解读

地址	论文地址：[2502.18600v2] Chain of Draft: Thinking Faster by Writing Less
时间	2025年2月25日
作者	Zoom团队

Abstract

Large Language Models (LLMs) have demonstrated remarkable performance in solving complex reasoning tasks through mechanisms like Chain-of-Thought (CoT) prompting, which emphasizes verbose, step-by-step reasoning. However, humans typically employ a more efficient strategy: drafting concise intermediate thoughts that capture only essential information. In this work, we propose Chain of Draft (CoD), a novel paradigm inspired by human cognitive processes, where LLMs generate minimalistic yet informative intermediate reasoning outputs while solving tasks. By reducing verbosity and focusing on critical insights, CoD matches or surpasses CoT in accuracy while using as little as only 7.6% of the tokens, significantly reducing cost and latency across various reasoning tasks. Our code and data are available at this https URL.

大型语言模型（LLMs）通过诸如链式思维（CoT）提示等机制在解决复杂推理任务方面表现出色，这种机制强调冗长、逐步的推理过程。然而，人类通常采用一种更高效的策略：起草简洁的中间想法，仅捕捉关键信息。在本研究中，我们提出了链式草稿（CoD）这一新范式，它受人类认知过程启发，让 LLM 在解决任务时生成简洁但信息丰富的中间推理输出。通过减少冗余并专注于关键见解，CoD 在准确性方面与 CoT 相当甚至更优，同时仅使用 7.6% 的标记量，显著降低了各种推理任务的成本和延迟。我们的代码和数据可在该 https URL 获取。

1、Introduction

Recent advances in reasoning models such as OpenAI o1 OpenAI (2024) and DeepSeek R1 Guo et al. (2025) have propelled large language models (LLMs) to unprecedented performance on complex tasks using techniques like Chain of Thought (CoT) Wei et al. (2022). This paradigm encourages models to break down problems into step-by-step explorations, mimicking the structured reasoning process of humans. While effective, this approach demands substantially more computational resources at inference time, leading to verbose outputs and higher latency. Such verbosity contrasts sharply with how humans typically approach problem-solving: we rely on concise drafts or shorthand notes to capture essential insights without unnecessary elaboration.

Motivated by this difference, we propose Chain of Draft (CoD), a novel prompting strategy that aligns more closely with human reasoning by prioritizing efficiency and minimalism. Instead of verbose intermediate steps, Chain of Draft encourages LLMs to generate concise, dense-information outputs at each step. This approach reduces latency and computational costs without sacrifice of accuracy, making LLMs more practical for real-world applications where efficiency is paramount.

近期，诸如 OpenAI 的 o1（OpenAI，2024 年）和 DeepSeek 的 R1（Guo 等人，2025 年）等推理模型的进展，借助链式思维（CoT）等技术（Wei 等人，2022 年），使大型语言模型（LLMs）在复杂任务上的表现达到了前所未有的高度。这种范式鼓励模型将问题分解为逐步探索的过程，模仿人类结构化的推理流程。尽管有效，但这种方法在推理时需要大量的计算资源，导致输出冗长且延迟更高。这种冗长性与人类解决问题的方式形成了鲜明对比：我们通常依靠简洁的草稿或简略笔记来捕捉关键见解，而无需不必要的详述。

鉴于这种差异，我们提出了“草稿链”（CoD），这是一种新颖的提示策略，通过优先考虑效率和简约性，更贴近人类的推理方式。与冗长的中间步骤不同，“草稿链”鼓励 LLM 在每一步生成简洁、信息密集型的输出。这种方法在不牺牲准确性的情况下降低了延迟和计算成本，使大型语言模型在效率至关重要的实际应用中更具实用性。

The intuition behind Chain of Draft is rooted in how humans externalize thought. When solving complex tasks — whether solving mathematical problems, drafting essays, or coding — we often jot down only the critical pieces of information that help us progress. By emulating this behavior, LLMs can focus on advancing toward solutions without the overhead of verbose reasoning.

To evaluate the effectiveness of Chain of Draft, we conducted experiments across a variety of benchmarks requiring multi-step reasoning, including arithmetic reasoning, common sense reasoning, and symbolic reasoning. Our results demonstrate that this minimalist approach maintains or even improves accuracy compared with standard Chain of Thought, while significantly reducing token usage and latency.

“草稿链”的直觉源于人类如何将思维外化。在解决复杂任务时——无论是解决数学问题、起草文章还是编写代码——我们通常只记录有助于我们推进的关键信息。通过模仿这种行为，大型语言模型可以专注于向解决方案推进，而无需冗长推理带来的开销。

为了评估“草稿链”的有效性，我们在需要多步推理的各种基准测试上进行了实验，包括算术推理、常识推理和符号推理。我们的结果表明，与标准的“思维链”相比，这种极简主义方法在保持甚至提高准确性的同时，显著减少了标记使用量和延迟。

The contributions of this paper are threefold:

• We introduce Chain of Draft, a concise reasoning prompting strategy inspired by human cognitive processes.

• We empirically validate that Chain of Draft can achieve significantly reduced latency and cost without sacrificing accuracy.

• We discuss the implications of Chain of Draft for LLM design, deployment, and real-world usability.

本文的贡献有三方面：

• 我们引入了“草稿链”，这是一种受人类认知过程启发的简洁推理提示策略。• 我们通过实证验证了“草稿链”能够在不牺牲准确性的情况下显著降低延迟和成本。

• 我们探讨了“草稿链”对大型语言模型设计、部署以及实际应用的影响。

Figure 1:Comparison of Claude 3.5 Sonnet’s accuracy and token usage across different tasks with three different prompt strategies: direct answer (Standard), Chain of Thought (CoT), and Chain of Draft (CoD). CoD achieves similar accuracy as CoT while using significant fewer tokens.图 1：在三种不同的提示策略（直接回答（标准）、思维链（CoT）和草稿链（CoD））下，Claude 3.5 生成十四行诗在不同任务中的准确性和标记使用情况对比。CoD 达到了与 CoT 相似的准确率，但使用的标记数量显著更少。

Discussion

The latency issue has often been overlooked in studies of the reasoning capabilities of LLMs. However, it is crucial for lots of real-time applications to have low latency while maintaining high-quality responses. In this work, we propose Chain of Draft (CoD), a novel approach that substantially reduces the latency required for reasoning while achieving comparable or even superior accuracy compared to standard Chain-of-Thought prompting strategies. Unlike traditional methods that often involve lengthy reasoning steps, CoD leverages concise reasoning drafts to speed up response generation without sacrificing correctness.

在对大型语言模型（LLM）推理能力的研究中，延迟问题常常被忽视。然而，对于许多实时应用来说，在保持高质量响应的同时实现低延迟至关重要。在本研究中，我们提出了“草稿链”（CoD）这一新颖方法，它能大幅降低推理所需的延迟，同时在准确性方面与标准的“思维链”提示策略相比，达到相当甚至更优的水平。与通常涉及冗长推理步骤的传统方法不同，CoD 利用简洁的推理草稿来加快响应生成速度，同时不牺牲正确性。

Additionally, CoD offers significant cost advantages. By compacting the reasoning steps, it reduces the number of input tokens required for few-shot prompting and shortens the output token length, directly lowering computational cost. This token efficiency makes CoD especially appealing in cost-sensitive scenarios, such as large-scale deployments of LLMs or applications with strict budget constraints.

CoD demonstrates that effective reasoning in LLMs does not necessarily require lengthy outputs, offering an alternative approach where reasoning depth is maintained with minimal verbosity. Future work could explore combining CoD with other latency-reducing methods, such as adaptive parallel reasoning or multi-pass validation, to further optimize performance across different application domains. In addition, the principles behind the compact reasoning of CoD could inspire new strategies to improve reasoning models by training with compact reasoning data, while maintaining interpretability and efficiency in LLMs, helping bridge the gap between research-driven improvements in reasoning and the practical demands of real world systems.

此外，CoD 还具有显著的成本优势。通过压缩推理步骤，它减少了少样本提示所需的输入标记数量，并缩短了输出标记长度，直接降低了计算成本。这种标记效率使 CoD 在成本敏感的场景中特别具有吸引力，例如 LLM 的大规模部署或预算严格的场景。

CoD 证明了在大型语言模型中进行有效推理不一定需要冗长的输出，提供了一种替代方法，在保持推理深度的同时尽量减少冗余。未来的研究可以探索将 CoD 与其他降低延迟的方法（如自适应并行推理或多轮验证）相结合，以进一步优化不同应用领域的性能。此外，CoD 背后的紧凑推理原则可以启发新的策略，通过使用紧凑推理数据进行训练来改进推理模型，同时保持 LLM 的可解释性和效率，有助于弥合推理研究驱动的改进与现实世界系统实际需求之间的差距。