谷歌发布开源大模型 Gemma，评测+最佳微调实践来啦！

news2026/2/15 15:16:41

Gemma 是由 Google 推出的一系列轻量级、先进的开源模型，他们是基于 Google Gemini 模型的研究和技术而构建。它们是一系列text generation，decoder-only的大型语言模型，对英文的支持较好，具有模型权重开源、并提供预训练版本（base模型）和指令微调版本（chat模型）。

本次 Gemma 开源提供了四个大型语言模型，提供了 2B 和 7B 两种参数规模的版本，每种都包含了预训练版本（base模型）和指令微调版本（chat模型）。

官方除了提供 pytorch 版本之外，也提供了GGUF版本，可在各类消费级硬件上运行，无需数据量化处理，并拥有高达 8K tokens 的处理能力，Gemma 7B模型的预训练数据高达6万亿Token，也证明了通过大量的高质量数据训练，可以大力出奇迹，小模型也可以持续提升取得好的效果。

那 Gemma 模型的能力怎么样呢？下面是Gemma模型的基础版本与其他开源模型在公开榜单的对比：

数据来源__https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

从榜单中可以看到，Gemma-7B模型超过了Mistral-7B模型，取得了一个很好的结果。技术报告:https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf

开源代码：

https://github.com/google/gemma_pytorch

目前社区已经支持 Gemma的下载、推理、微调一站式体验， 并提供对应最佳实践教程，欢迎感兴趣的开发者小伙伴们来玩！

技术交流

前沿技术资讯、算法交流、求职内推、算法竞赛、面试交流(校招、社招、实习)等、与 10000+来自港科大、北大、清华、中科院、CMU、腾讯、百度等名校名企开发者互动交流~

我们建了大模型面试与技术交流群，想要进交流群、获取完整源码&资料、提升技术的同学，可以直接加微信号：mlc2060。加的时候备注一下：研究方向 +学校/公司+CSDN，即可。然后就可以拉你进群了。

方式①、微信搜索公众号：机器学习社区，后台回复：技术交流
方式②、添加微信号：mlc2060，备注：技术交流

用通俗易懂的方式讲解系列

用通俗易懂的方式讲解：不用再找了，这是大模型最全的面试题库
用通俗易懂的方式讲解：这是我见过的最适合大模型小白的 PyTorch 中文课程
用通俗易懂的方式讲解：一文讲透最热的大模型开发框架 LangChain
用通俗易懂的方式讲解：基于 LangChain + ChatGLM搭建知识本地库
用通俗易懂的方式讲解：基于大模型的知识问答系统全面总结
用通俗易懂的方式讲解：ChatGLM3 基础模型多轮对话微调
用通俗易懂的方式讲解：最火的大模型训练框架 DeepSpeed 详解来了
用通俗易懂的方式讲解：这应该是最全的大模型训练与微调关键技术梳理
用通俗易懂的方式讲解：Stable Diffusion 微调及推理优化实践指南
用通俗易懂的方式讲解：大模型训练过程概述
用通俗易懂的方式讲解：专补大模型短板的RAG
用通俗易懂的方式讲解：大模型LLM Agent在 Text2SQL 应用上的实践
用通俗易懂的方式讲解：大模型 LLM RAG在 Text2SQL 上的应用实践
用通俗易懂的方式讲解：大模型微调方法总结
用通俗易懂的方式讲解：涨知识了，这篇大模型 LangChain 框架与使用示例太棒了
用通俗易懂的方式讲解：掌握大模型这些优化技术，优雅地进行大模型的训练和推理！

环境配置与安装

python 3.10及以上版本
pytorch 1.12及以上版本，推荐2.0及以上版本
建议使用CUDA 11.4及以上
transformers>=4.38.0

Gemma模型链接和下载

支持直接下载模型的repo：

from modelscope import snapshot_download
model_dir = snapshot_download("AI-ModelScope/gemma-7b-it")

Gemma模型推理

需要使用tokenizer.apply_chat_template获取指令微调模型的prompt template：

from modelscope import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("AI-ModelScope/gemma-7b-it")
model = AutoModelForCausalLM.from_pretrained("AI-ModelScope/gemma-7b-it", torch_dtype = torch.bfloat16, device_map="auto")

input_text = "hello."
messages = [
    {"role": "user", "content": input_text}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
input_ids = tokenizer([text], return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids,max_new_tokens=256)
print(tokenizer.decode(outputs[0]))

资源消耗：

模型微调和微调后推理

我们使用SWIFT来对模型进行微调，SWIFT是魔搭社区官方提供的LLM&AIGC模型微调推理框架。

微调代码开源地址:

https://github.com/modelscope/swift

我们使用hc3-zh分类数据集进行微调. 任务是: 判断数据样本的回答来自human还是chatgpt.

环境准备:

git clone https://github.com/modelscope/swift.git
cd swift
pip install .[llm]

微调脚本: LoRA

# https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/gemma_2b_instruct/lora
# Experimental environment: V100, A10, 3090
# 12GB GPU memory

CUDA_VISIBLE_DEVICES=0 \
swift sft \
    --model_id_or_path AI-ModelScope/gemma-2b-it \
    --sft_type lora \
    --tuner_backend swift \
    --template_type AUTO \
    --dtype AUTO \
    --output_dir output \
    --dataset hc3-zh \
    --train_dataset_sample 5000 \
    --num_train_epochs 1 \
    --max_length 2048 \
    --check_dataset_strategy warning \
    --lora_rank 8 \
    --lora_alpha 32 \
    --lora_dropout_p 0.05 \
    --lora_target_modules ALL \
    --gradient_checkpointing true \
    --batch_size 1 \
    --weight_decay 0.01 \
    --learning_rate 1e-4 \
    --gradient_accumulation_steps 16 \
    --max_grad_norm 0.5 \
    --warmup_ratio 0.1 \
    --eval_steps 100 \
    --save_steps 100 \
    --save_total_limit 2 \
    --logging_steps 10 \

训练过程也支持本地数据集，需要指定如下参数：

--custom_train_dataset_path xxx.jsonl \
--custom_val_dataset_path yyy.jsonl \

微调后推理脚本: （这里的ckpt_dir需要修改为训练生成的checkpoint文件夹）

# Experimental environment: V100, A10, 3090
CUDA_VISIBLE_DEVICES=0 \
swift infer \
    --ckpt_dir "output/gemma-2b-instruct/vx_xxx/checkpoint-xxx" \
    --load_dataset_config true \
    --max_length 2048 \
    --max_new_tokens 2048 \
    --temperature 0.1 \
    --top_p 0.7 \
    --repetition_penalty 1. \
    --do_sample true \

微调的可视化结果

训练准确率:

训练后生成样例:

[PROMPT]<bos><start_of_turn>user
Classification Task: Are the following responses from a human or from ChatGPT?
Question: 能帮忙解决一下吗
Answer: 当然，我很乐意帮助你解决问题。请提出你的问题，我会尽力给出最好的帮助。
Category: Human, ChatGPT
Output:<end_of_turn>
<start_of_turn>model
[OUTPUT]ChatGPT<end_of_turn>

[LABELS]ChatGPT
---------------------------------------------------
[PROMPT]<bos><start_of_turn>user
Classification Task: Are the following responses from a human or from ChatGPT?
Question: 请问哪样存钱好
Answer: 若需了解招商银行存款利率，可进入招行主页在网页右下侧“实时金融信息”下方选择“存款利率”查看。
Category: Human, ChatGPT
Output:<end_of_turn>
<start_of_turn>model
[OUTPUT]Human<end_of_turn>

[LABELS]Human