《Toolformer: Language Models Can Teach Themselves to Use Tools》论文解读

0. 引言

《Toolformer: Language Models Can Teach Themselves to Use Tools》论文主要探讨了语言模型（LMs）在解决新任务时的能力和局限性，并提出了一个名为 Toolformer 的新方法。该方法通过简单 API 接口将外部工具与 LMs 相结合，实现了 LMs 在自监督学习中的自我训练。实验结果表明，Toolformer 在保留了核心的语言建模能力的基础上，能显著提高零样本下的下游任务表现，且在许多情况下甚至可以与更大规模的模型相媲美。

1. 数据处理

数据处理希望达到以下的结果：

其中，颜色框出的部分是 API 调用：工具+参数+答案。那么这些数据是如何生成的呢？论文中实际上是通过 few-shot 的方式，给出一些例子让其他大模型根据例子去生成满足这些格式的数据样本。

其中，工具和对应的 Prompts：

问题回答

Your task is to add calls to a Question
Answering API to a piece of text.
The questions should help you get
information required to complete the
text. You can call the API by writing
"[QA(question)]" where "question" is the
question you want to ask. Here are some
examples of API calls:
Input: Joe Biden was born in Scranton,
Pennsylvania.
Output: Joe Biden was born in [QA("Where
was Joe Biden born?")] Scranton,
[QA("In which state is Scranton?")]
Pennsylvania.
Input: Coca-Cola, or Coke, is a
carbonated soft drink manufactured by
the Coca-Cola Company.
Output: Coca-Cola, or [QA("What other
name is Coca-Cola known by?")] Coke, is
a carbonated soft drink manufactured by
[QA("Who manufactures Coca-Cola?")] the
Coca-Cola Company.
Input: x
Output:

计算器

Your task is to add calls to a
Calculator API to a piece of text.
The calls should help you get
information required to complete the
text. You can call the API by writing
"[Calculator(expression)]" where
"expression" is the expression to be
computed. Here are some examples of API
calls:
Input: The number in the next term is 18
+ 12 x 3 = 54.
Output: The number in the next term is
18 + 12 x 3 = [Calculator(18 + 12 * 3)]
54.
Input: The population is 658,893 people.
This is 11.4% of the national average of
5,763,868 people.
Output: The population is 658,893 people.
This is 11.4% of the national average of
[Calculator(658,893 / 11.4%)] 5,763,868
people.
Input: A total of 252 qualifying matches
were played, and 723 goals were scored
(an average of 2.87 per match). This is
three times less than the 2169 goals
last year.
Output: A total of 252 qualifying
matches were played, and 723 goals were
scored (an average of [Calculator(723
/ 252)] 2.87 per match). This is twenty
goals more than the [Calculator(723 -
20)] 703 goals last year.
Input: I went to Paris in 1994 and
stayed there until 2011, so in total,
it was 17 years.
Output: I went to Paris in 1994 and
stayed there until 2011, so in total, it
was [Calculator(2011 - 1994)] 17 years.
Input: From this, we have 4 * 30 minutes
= 120 minutes.
Output: From this, we have 4 * 30
minutes = [Calculator(4 * 30)] 120
minutes.
Input: x
Output:

维基百科搜索

Your task is to complete a given piece
of text. You can use a Wikipedia Search
API to look up information. You can do
so by writing "[WikiSearch(term)]" where
"term" is the search term you want to
look up. Here are some examples of API
calls:
Input: The colors on the flag of Ghana
have the following meanings: red is for
the blood of martyrs, green for forests,
and gold for mineral wealth.
Output: The colors on the flag of Ghana
have the following meanings: red is for
[WikiSearch("Ghana flag red meaning")]
the blood of martyrs, green for forests,
and gold for mineral wealth.
Input: But what are the risks during
production of nanomaterials? Some nanomaterials may give rise to various
kinds of lung damage.
Output: But what are the risks
during production of nanomaterials?
[WikiSearch("nanomaterial production
risks")] Some nanomaterials may give
rise to various kinds of lung damage.
Input: Metformin is the first-line drug
for patients with type 2 diabetes and
obesity.
Output: Metformin is the first-line drug
for [WikiSearch("Metformin first-line
drug")] patients with type 2 diabetes
and obesity.
Input: x
Output:

机器翻译

Your task is to complete a given piece
of text by using a Machine Translation
API.
You can do so by writing "[MT(text)]"
where text is the text to be translated
into English.
Here are some examples:
Input: He has published one book: O
homem suprimido (“The Supressed Man”)
Output: He has published one book: O
homem suprimido [MT(O homem suprimido)]
(“The Supressed Man”)
Input: In Morris de Jonge’s Jeschuah,
der klassische jüdische Mann, there is a
description of a Jewish writer
Output: In Morris de Jonge’s Jeschuah,
der klassische jüdische Mann [MT(der
klassische jüdische Mann)], there is a
description of a Jewish writer
Input: 南 京 高 淳 县 住 房 和 城 乡 建 设 局 城 市 新
区 设 计 a plane of reference Gaochun is
one of seven districts of the provincial
capital Nanjing
Output: [MT(南京高淳县住房和城乡建设局 城市新
区 设 计)] a plane of reference Gaochun is
one of seven districts of the provincial
capital Nanjing
Input: x
Output:

日历

Your task is to add calls to a Calendar
API to a piece of text. The API calls
should help you get information required
to complete the text. You can call the
API by writing "[Calendar()]" Here are
some examples of API calls:
Input: Today is the first Friday of the
year.
Output: Today is the first [Calendar()]
Friday of the year.
Input: The president of the United
States is Joe Biden.
Output: The president of the United
States is [Calendar()] Joe Biden.
Input: The current day of the week is
Wednesday.
Output: The current day of the week is
[Calendar()] Wednesday.
Input: The number of days from now until
Christmas is 30.
Output: The number of days from now
until Christmas is [Calendar()] 30.
Input: The store is never open on the
weekend, so today it is closed.
Output: The store is never open on the
weekend, so today [Calendar()] it is
closed.
Input: x
Output:

2. 整体流程

API 调用表示为：
$c = (a_c, i_c)$
其中：

（1） $c$ 表示 API 调用；

（2） $a_c$ 是 API 的名字；

（3） $i_c$ 是对应的输入。

根据有没有 API 输出，可以分为：
$e(c) = < API > a_c(i_c) < / API >$
$e(c,r) = < API > a_c(i_c) → r < / API >$
其中，“<API>”，“</API>” 和 “→”是特殊的 token，用于标识中间内容是需要调用第三方工具的，实际中分别对应 “[”,、“]” 和 “->”。

2.1 API调用采样

根据前一章节的数据处理，让大模型自动生成多个 API 调用，如下：

对于模型 $M$ ， $z_{n+1}$ 作为 $z_1,...,z_n$ 后一个序列的概率：
$p_M(z_{n+1}|z_1,...,z_n)$
那么，对于每一个 $i\in{1,...,n}$ ， $i$ 插入 API 调用的概率：
$p_i = p_M(< API > | P(\mathbf{x}),x_{1:i-1})$

当设置一个概率阈值 $\tau$ 后和采样上限 $k$ 后，则可以获取插入的位置：
$\{ i|p_i > \tau_s \}$
如果满足的插入位置超过 $k$ 之后，只取概率最大的前 $k$ 个。

2.2 执行 API 调用

下一步执行所有 API 调用来获得相应的结果。可能涉及到调用另一个神经网络处理、执行 Python 脚本或使用检索系统在大型语料库上执行搜索。

2.3 过滤 API 调用结果

首先定义权重交叉熵 loss：
$L_i(\mathbf{z})=-\sum_{j=i}^n\omega_{j-i} \cdot logp_M(x_j|\mathbf{z},x_{1:j-1})$
现考虑两种 loss：
$L_i^+=L_i(e(c_i,r_i))$
$L_i^-=min(L_i(\epsilon),L_i(e(c_i,\epsilon)))$
其中：