LLaMA(Open and Efficient Foundation Language Models )论文解读(二)

news2026/2/13 21:15:53

此篇博客主题:LLAMA模型数据、训练时长、功耗及碳排放量
LLaMA: Open and Efficient Foundation Language Models
paper https://arxiv.org/pdf/2302.13971v1.pdf

1 训练样本

Overall, our entire training dataset contains roughly 1.4T tokens after tokenization. For most of our training data, each token is used only once during training, with the exception of the Wikipedia
and Books domains, over which we perform approximately two epochs.

模型训练样本来源及占比如下图，经数据清理去重后剩下1.4Ttokens数据（1.4T=1.4e12）
数据训练次数见Epochs ，大多数都只训练一轮，但book，wikipeida等数据会训练两轮左右（可能数据价值更高）

2 训练时间

When training a 65B-parameter model, our code processes around 380 tokens/sec/GPU on 2048 A100 GPU with 80GB of RAM. This means that training over our dataset containing 1.4T tokens takes approximately 21 days.
训练65B参数模型：
GPU数：2048
GPU型号：A100，80G
训练数据：1.4T
GPU数据处理速度：380 tokens/s/GPU
训练时间：21天（计算公式如下）
$t = 1.4 * 1 e 12/ (2048 * 24 * 3600 * 380) = 21 d a y$

3 碳排放量

每小时瓦数估计Watt-hour（WH）
$Wh = GP U - h * (GP U 瓦数) * P U E$
PUE表示：电源使用效率
碳排放量公式为
$tCO_2eq=MWH*0.385$

we estimate that we used 2048 A100-80GBfor a period of approximately 5 months to develop our models. This means that developing these models would have cost around 2,638 MWh under our assumptions, and a total emission of 1,015 tCO2eq.
我们使用2048个A100 80GPU，开发了约5个月。大约使用了2638Mwh，碳排放量约为1015tCO2eq

4 思考

We hope that releasing these models will help to reduce future carbon emission since the training is already done, and some of the models are relatively small and can be run on a single GPU.
我们希望开源更多的大模型，再已有的模型基础上训练，减少重复开发，减少碳排放量。

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.coloradmin.cn/o/764311.html

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈，一经查实，立即删除！