Ubuntu2204下使用NVIDIA GeForce RTX 4090进行DeepSeek-R1-Distill-Llama-8B模型微调

环境准备
- 创建Python微调环境
- 准备数据集
- 准备模型文件
模型微调
模型预测
- 原始模型预测
- 微调模型预测

使用unsloth，可以方便地对大模型进行微调。以微调DeepSeek-R1-Distill-Llama-8B为例：

微调过程中，大概需要9G以上的显存，即可正常工作。
训练时间与训练的参数per_device_train_batch_size和max_steps有关。当per_device_train_batch_size为2时，使用RTX 4090训练一次大概需要3秒，训练600次，大概用于30分钟。训练10000次，大概需要8个小时。
训练的loss初步测试，在训练8小时后，由最初的2.3左右，下降在1.1左右。24小时的训练损失，还在测试中。
下面文章中进行的验证，只是训练600次后的结果。

原模型输出结果	微调模型输出结果
根据病人的症状和检查结果，建议采取以下步骤：\n\n1. 进一步诊断：进行腹部超声检查以明确包块的性质，确认是否为脓肿、肝炎或其他结构。\n2. 实验室检查：包括血常规、C反应蛋白、结核菌素试验、血培养和肝肾功能测试，以评估感染情况。\n3. 抗生素治疗：根据实验室结果，选择敏感的抗生素开始治疗，覆盖可能的菌落。\n4. 影像学引流：如果确诊为脓肿，考虑腹腔镜下引流或外科手术。\n5. 综合治疗：结合抗生素治疗和支持治疗，如营养补充和疼痛管理。\n6. 监测与随访：定期复查，监测病情进展，评估治疗效果。	根据病人的症状和体检结果，虽然右下腹存在压痛的包块，但目前病人的症状还没有完全稳定，腹痛仅有轻微减轻，仍然有发热。这种情况下，通常不需要立即进行手术治疗。通常，急性阑尾炎可以通过观察、药物治疗和营养支持来缓解症状。然而，包块的存在可能提示炎症已经发展到一定程度，需要密切监测。\n\n在这种情况下，建议继续使用抗生素和止痛药，以缓解症状，并密切观察病情的变化。如果病情进一步恶化，比如包块增大、腹痛加重或伴随发热不退、呕吐、发黄等症状，可能需要考虑进一步的处理，包括影像学检查或手术。\n\n总之，目前的处理策略是继续观察和治疗，同时密切监测病情变化，以便及时调整治疗方案。希望病人能够尽快恢复健康。<｜end▁of▁sentence｜>

原模型输出结果

微调模型输出结果

根据病人的症状和检查结果，建议采取以下步骤：\n\n1. 进一步诊断：进行腹部超声检查以明确包块的性质，确认是否为脓肿、肝炎或其他结构。\n2. 实验室检查：包括血常规、C反应蛋白、结核菌素试验、血培养和肝肾功能测试，以评估感染情况。\n3. 抗生素治疗：根据实验室结果，选择敏感的抗生素开始治疗，覆盖可能的菌落。\n4. 影像学引流：如果确诊为脓肿，考虑腹腔镜下引流或外科手术。\n5. 综合治疗：结合抗生素治疗和支持治疗，如营养补充和疼痛管理。\n6. 监测与随访：定期复查，监测病情进展，评估治疗效果。

根据病人的症状和体检结果，虽然右下腹存在压痛的包块，但目前病人的症状还没有完全稳定，腹痛仅有轻微减轻，仍然有发热。这种情况下，通常不需要立即进行手术治疗。通常，急性阑尾炎可以通过观察、药物治疗和营养支持来缓解症状。然而，包块的存在可能提示炎症已经发展到一定程度，需要密切监测。\n\n在这种情况下，建议继续使用抗生素和止痛药，以缓解症状，并密切观察病情的变化。如果病情进一步恶化，比如包块增大、腹痛加重或伴随发热不退、呕吐、发黄等症状，可能需要考虑进一步的处理，包括影像学检查或手术。\n\n总之，目前的处理策略是继续观察和治疗，同时密切监测病情变化，以便及时调整治疗方案。希望病人能够尽快恢复健康。<｜end▁of▁sentence｜>

训练10000次的损失情况。

在这里插入图片描述

下面内容为微调过程的笔记整理。

环境准备

创建Python微调环境

创建一个Python虚拟环境，并激活环境。

conda create -n unsloth_sft python=3.10
conda activate unsloth_sft

创建一个requirements文件

(unsloth) ubuntu@ubuntu-server:~/train$ cat requirements.txt 
accelerate==1.3.0
aiohappyeyeballs==2.4.6
aiohttp==3.11.12
aiohttp-cors==0.7.0
aiosignal==1.3.2
airportsdata==20241001
annotated-types==0.7.0
anyio==4.8.0
astor==0.8.1
async-timeout==5.0.1
attrs==25.1.0
bitsandbytes==0.45.2
blake3==1.0.4
cachetools==5.5.1
certifi==2025.1.31
charset-normalizer==3.4.1
click==8.1.8
cloudpickle==3.1.1
colorful==0.5.6
compressed-tensors==0.9.1
cut-cross-entropy==25.1.1
datasets==3.2.0
depyf==0.18.0
dill==0.3.8
diskcache==5.6.3
distlib==0.3.9
distro==1.9.0
docstring_parser==0.16
einops==0.8.1
exceptiongroup==1.2.2
fastapi==0.115.8
filelock==3.17.0
frozenlist==1.5.0
fsspec==2024.9.0
gguf==0.10.0
google-api-core==2.24.1
google-auth==2.38.0
googleapis-common-protos==1.67.0rc1
grpcio==1.70.0
h11==0.14.0
hf_transfer==0.1.9
httpcore==1.0.7
httptools==0.6.4
httpx==0.28.1
huggingface-hub==0.28.1
idna==3.10
importlib_metadata==8.6.1
iniconfig==2.0.0
interegular==0.3.3
Jinja2==3.1.5
jiter==0.8.2
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
lark==1.2.2
lm-format-enforcer==0.10.9
markdown-it-py==3.0.0
MarkupSafe==3.0.2
mdurl==0.1.2
mistral_common==1.5.3
modelscope==1.22.3
mpmath==1.3.0
msgpack==1.1.0
msgspec==0.19.0
multidict==6.1.0
multiprocess==0.70.16
nest-asyncio==1.6.0
networkx==3.4.2
numpy==1.26.4
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-cusparselt-cu12==0.6.2
nvidia-ml-py==12.570.86
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
openai==1.61.1
opencensus==0.11.4
opencensus-context==0.1.3
opencv-python-headless==4.11.0.86
outlines==0.1.11
outlines_core==0.1.26
packaging==24.2
pandas==2.2.3
partial-json-parser==0.2.1.1.post5
peft==0.14.0
pillow==11.1.0
pip==25.0
platformdirs==4.3.6
pluggy==1.5.0
prometheus_client==0.21.1
prometheus-fastapi-instrumentator==7.0.2
propcache==0.2.1
proto-plus==1.26.0
protobuf==3.20.3
psutil==6.1.1
py-cpuinfo==9.0.0
py-spy==0.4.0
pyarrow==19.0.0
pyasn1==0.6.1
pyasn1_modules==0.4.1
pybind11==2.13.6
pycountry==24.6.1
pydantic==2.10.6
pydantic_core==2.27.2
Pygments==2.19.1
pytest==8.3.4
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
pytz==2025.1
PyYAML==6.0.2
pyzmq==26.2.1
ray==2.42.1
referencing==0.36.2
regex==2024.11.6
requests==2.32.3
rich==13.9.4
rpds-py==0.22.3
rsa==4.9
safetensors==0.5.2
sentencepiece==0.2.0
setuptools==75.8.0
shtab==1.7.1
six==1.17.0
smart-open==7.1.0
sniffio==1.3.1
starlette==0.45.3
sympy==1.13.1
tiktoken==0.8.0
tokenizers==0.21.0
tomli==2.2.1
torch==2.5.1
torchaudio==2.5.1
torchvision==0.20.1
tqdm==4.67.1
transformers==4.48.3
triton==3.1.0
trl==0.14.0
typeguard==4.4.1
typing_extensions==4.12.2
tyro==0.9.14
tzdata==2025.1
unsloth==2025.2.4
unsloth_zoo==2025.2.3
urllib3==2.3.0
uvicorn==0.34.0
uvloop==0.21.0
virtualenv==20.29.2
vllm==0.7.2
watchfiles==1.0.4
websockets==14.2
wheel==0.45.1
wrapt==1.17.2
xformers==0.0.28.post3
xgrammar==0.1.11
xxhash==3.5.0
yarl==1.18.3
zipp==3.21.0

安装依赖包

pip install -r requirements.txt

安装后的Python依赖信息

(unsloth_sft) ubuntu@ubuntu-server:~$ pip list
Package                           Version
--------------------------------- -------------
accelerate                        1.3.0
aiohappyeyeballs                  2.4.6
aiohttp                           3.11.12
aiohttp-cors                      0.7.0
aiosignal                         1.3.2
airportsdata                      20241001
annotated-types                   0.7.0
anyio                             4.8.0
astor                             0.8.1
async-timeout                     5.0.1
attrs                             25.1.0
bitsandbytes                      0.45.2
blake3                            1.0.4
cachetools                        5.5.1
certifi                           2025.1.31
charset-normalizer                3.4.1
click                             8.1.8
cloudpickle                       3.1.1
colorful                          0.5.6
compressed-tensors                0.9.1
cut-cross-entropy                 25.1.1
datasets                          3.2.0
depyf                             0.18.0
dill                              0.3.8
diskcache                         5.6.3
distlib                           0.3.9
distro                            1.9.0
docstring_parser                  0.16
einops                            0.8.1
exceptiongroup                    1.2.2
fastapi                           0.115.8
filelock                          3.17.0
frozenlist                        1.5.0
fsspec                            2024.9.0
gguf                              0.10.0
google-api-core                   2.24.1
google-auth                       2.38.0
googleapis-common-protos          1.67.0rc1
grpcio                            1.70.0
h11                               0.14.0
hf_transfer                       0.1.9
httpcore                          1.0.7
httptools                         0.6.4
httpx                             0.28.1
huggingface-hub                   0.28.1
idna                              3.10
importlib_metadata                8.6.1
iniconfig                         2.0.0
interegular                       0.3.3
Jinja2                            3.1.5
jiter                             0.8.2
jsonschema                        4.23.0
jsonschema-specifications         2024.10.1
lark                              1.2.2
lm-format-enforcer                0.10.9
markdown-it-py                    3.0.0
MarkupSafe                        3.0.2
mdurl                             0.1.2
mistral_common                    1.5.3
modelscope                        1.22.3
mpmath                            1.3.0
msgpack                           1.1.0
msgspec                           0.19.0
multidict                         6.1.0
multiprocess                      0.70.16
nest-asyncio                      1.6.0
networkx                          3.4.2
numpy                             1.26.4
nvidia-cublas-cu12                12.4.5.8
nvidia-cuda-cupti-cu12            12.4.127
nvidia-cuda-nvrtc-cu12            12.4.127
nvidia-cuda-runtime-cu12          12.4.127
nvidia-cudnn-cu12                 9.1.0.70
nvidia-cufft-cu12                 11.2.1.3
nvidia-curand-cu12                10.3.5.147
nvidia-cusolver-cu12              11.6.1.9
nvidia-cusparse-cu12              12.3.1.170
nvidia-cusparselt-cu12            0.6.2
nvidia-ml-py                      12.570.86
nvidia-nccl-cu12                  2.21.5
nvidia-nvjitlink-cu12             12.4.127
nvidia-nvtx-cu12                  12.4.127
openai                            1.61.1
opencensus                        0.11.4
opencensus-context                0.1.3
opencv-python-headless            4.11.0.86
outlines                          0.1.11
outlines_core                     0.1.26
packaging                         24.2
pandas                            2.2.3
partial-json-parser               0.2.1.1.post5
peft                              0.14.0
pillow                            11.1.0
pip                               25.0
platformdirs                      4.3.6
pluggy                            1.5.0
prometheus_client                 0.21.1
prometheus-fastapi-instrumentator 7.0.2
propcache                         0.2.1
proto-plus                        1.26.0
protobuf                          3.20.3
psutil                            6.1.1
py-cpuinfo                        9.0.0
py-spy                            0.4.0
pyarrow                           19.0.0
pyasn1                            0.6.1
pyasn1_modules                    0.4.1
pybind11                          2.13.6
pycountry                         24.6.1
pydantic                          2.10.6
pydantic_core                     2.27.2
Pygments                          2.19.1
pytest                            8.3.4
python-dateutil                   2.9.0.post0
python-dotenv                     1.0.1
pytz                              2025.1
PyYAML                            6.0.2
pyzmq                             26.2.1
ray                               2.42.1
referencing                       0.36.2
regex                             2024.11.6
requests                          2.32.3
rich                              13.9.4
rpds-py                           0.22.3
rsa                               4.9
safetensors                       0.5.2
sentencepiece                     0.2.0
setuptools                        75.8.0
shtab                             1.7.1
six                               1.17.0
smart-open                        7.1.0
sniffio                           1.3.1
starlette                         0.45.3
sympy                             1.13.1
tiktoken                          0.8.0
tokenizers                        0.21.0
tomli                             2.2.1
torch                             2.5.1
torchaudio                        2.5.1
torchvision                       0.20.1
tqdm                              4.67.1
transformers                      4.48.3
triton                            3.1.0
trl                               0.14.0
typeguard                         4.4.1
typing_extensions                 4.12.2
tyro                              0.9.14
tzdata                            2025.1
unsloth                           2025.2.4
unsloth_zoo                       2025.2.3
urllib3                           2.3.0
uvicorn                           0.34.0
uvloop                            0.21.0
virtualenv                        20.29.2
vllm                              0.7.2
watchfiles                        1.0.4
websockets                        14.2
wheel                             0.45.1
wrapt                             1.17.2
xformers                          0.0.28.post3
xgrammar                          0.1.11
xxhash                            3.5.0
yarl                              1.18.3
zipp                              3.21.0
(unsloth_sft) ubuntu@ubuntu-server:~$

准备数据集

数据集下载地址：

https://modelscope.cn/datasets/FreedomIntelligence/medical-o1-reasoning-SFT/files

下载数据集，可以使用modelscope命令。

安装modelscope软件包

pip install modelscope

使用下面的命令下载数据集。

modelscope download --dataset FreedomIntelligence/medical-o1-reasoning-SFT --local_dir ./medical-o1-reasoning-SFT

下载完成后

(unsloth_sft) ubuntu@ubuntu-server:~/datasets/medical-o1-reasoning-SFT$ pwd
/home/ubuntu/datasets/medical-o1-reasoning-SFT
(unsloth_sft) ubuntu@ubuntu-server:~/datasets/medical-o1-reasoning-SFT$ ll
total 135748
drwxrwxr-x 2 ubuntu ubuntu     4096  2月 12 18:24 ./
drwxrwxr-x 4 ubuntu ubuntu     4096  2月 12 18:22 ../
-rw-rw-r-- 1 ubuntu ubuntu 64814379  2月 12 18:23 medical_o1_sft_Chinese.json
-rw-rw-r-- 1 ubuntu ubuntu 74078226  2月 12 18:23 medical_o1_sft.json
-rw-rw-r-- 1 ubuntu ubuntu    99913  2月 12 18:23 README.md
(unsloth_sft) ubuntu@ubuntu-server:~/datasets/medical-o1-reasoning-SFT$

准备模型文件

使用下面的命令，下载模型文件

modelscope download --model unsloth/DeepSeek-R1-Distill-Llama-8B --local_dir .

下载完成后

(unsloth_sft) ubuntu@ubuntu-server:~/model/DeepSeek-R1-Distill-Llama-8B$ pwd
/home/ubuntu/model/DeepSeek-R1-Distill-Llama-8B
(unsloth_sft) ubuntu@ubuntu-server:~/model/DeepSeek-R1-Distill-Llama-8B$ ll -h
total 15G
drwxrwxr-x 3 ubuntu ubuntu 4.0K  2月 12 17:43 ./
drwxrwxr-x 7 ubuntu ubuntu 4.0K  2月 12 16:29 ../
-rw-rw-r-- 1 ubuntu ubuntu  959  2月 12 16:44 config.json
-rw-rw-r-- 1 ubuntu ubuntu   73  2月 12 16:44 configuration.json
-rw-rw-r-- 1 ubuntu ubuntu  236  2月 12 16:44 generation_config.json
-rw-rw-r-- 1 ubuntu ubuntu 4.7G  2月 12 17:05 model-00001-of-00004.safetensors
-rw-rw-r-- 1 ubuntu ubuntu 4.7G  2月 12 17:23 model-00002-of-00004.safetensors
-rw-rw-r-- 1 ubuntu ubuntu 4.6G  2月 12 17:39 model-00003-of-00004.safetensors
-rw-rw-r-- 1 ubuntu ubuntu 1.1G  2月 12 17:43 model-00004-of-00004.safetensors
-rw-rw-r-- 1 ubuntu ubuntu  24K  2月 12 17:43 model.safetensors.index.json
-rw------- 1 ubuntu ubuntu  952  2月 12 17:43 .msc
-rw-rw-r-- 1 ubuntu ubuntu   36  2月 12 17:43 .mv
-rw-rw-r-- 1 ubuntu ubuntu  16K  2月 12 17:43 README.md
-rw-rw-r-- 1 ubuntu ubuntu  483  2月 12 17:43 special_tokens_map.json
drwxrwxr-x 2 ubuntu ubuntu 4.0K  2月 12 17:43 ._____temp/
-rw-rw-r-- 1 ubuntu ubuntu  52K  2月 12 17:43 tokenizer_config.json
-rw-rw-r-- 1 ubuntu ubuntu  17M  2月 12 17:43 tokenizer.json
(unsloth_sft) ubuntu@ubuntu-server:~/model/DeepSeek-R1-Distill-Llama-8B$

模型微调

编写模型训练代码train.py

$ cat train.py 

from datasets import load_dataset, Dataset
from datasets.features.features import Features

from unsloth import FastLanguageModel 
from unsloth import is_bfloat16_supported
import torch
from trl import SFTTrainer
from transformers import TrainingArguments

# 数据集路径
dataset_path = "/home/ubuntu/datasets/medical-o1-reasoning-SFT"

# 加载本地数据集
dataset = load_dataset(
    path=dataset_path,
    data_files=["medical_o1_sft_Chinese.json"],
    split="train",
    )

print(dataset)

# 数据集信息
# Dataset({
#     features: ['Question', 'Complex_CoT', 'Response'],
#     num_rows: 24772
# })

# features = dataset.features

# print(features)
# print(type(features))

# data = dataset[:5]

# print(data)

# question = dataset.features["Question"]

# print(question)



prompt_style = """
Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
# You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. Please answer the following medical question. 

### Question:
{}

### Response:
# <think>{}"""

train_prompt_style = """
Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. Please answer the following medical question. 

### Question:
{}

### Response:
<think>
{}
</think>
{}"""


# 定义问题
question = "一个患有急性阑尾炎的病人已经发病5天，腹痛稍有减轻但仍然发热，在体检时发现右下腹有压痛的包块，此时应如何处理?"

max_seq_length = 2048
max_lora_rank = 32

# 加载模型
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "/home/ubuntu/model/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length = max_seq_length,
    dtype = None,
    load_in_4bit = True,
    # fast_inference=True,
    # max_lora_rank=max_lora_rank,
    # gpu_memory_utilization=0.6,
)

# 模型预测
FastLanguageModel.for_inference(model=model)

inputs = tokenizer(prompt_style.format(question, ""), return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1300,
    use_cache=True,
)

response = tokenizer.batch_decode(outputs)    

print(response[0].split("### Response:")[1])

# ########################################################
# (unsloth) (base) ubuntu@ubuntu-server:~/code/unsloth_train$ /home/ubuntu/miniconda3/envs/unsloth/bin/python /home/ubuntu/code/unsloth_train/train.py
# 🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
# 🦥 Unsloth Zoo will now patch everything to make training faster!
# INFO 02-13 09:35:53 __init__.py:190] Automatically detected platform cuda.
# Repo card metadata block was not found. Setting CardData to empty.
# Dataset({
#     features: ['Question', 'Complex_CoT', 'Response'],
#     num_rows: 24772
# })
# ==((====))==  Unsloth 2025.2.4: Fast Llama patching. Transformers: 4.48.3.
#    \\   /|    GPU: NVIDIA GeForce RTX 4090. Max memory: 23.635 GB. Platform: Linux.
# O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 8.9. CUDA Toolkit: 12.4. Triton: 3.1.0
# \        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
#  "-____-"     Free Apache license: http://github.com/unslothai/unsloth
# Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
# Loading checkpoint shards: 100%|██████████████████████████████████████████████████████| 4/4 [00:07<00:00,  1.89s/it]

# # <think>
# 嗯，我现在要处理一个急性阑尾炎的病人，他已经发病5天了，腹痛稍微减轻但仍然发热。在体检时发现右下腹有压痛的包块。这是个需要仔细考虑的临床情况。

# 首先，急性阑尾炎的常见症状包括发热、腹痛、发热、腹部压痛，尤其是右下腹。包块的出现可能提示着阑炎已经进展，可能形成了腺体或者其他结构的变化。但也有可能是误诊，比如肝炎、肝脓肿、胃肠梗阻等。

# 接下来，考虑到病人已经发病5天，腹痛减轻但仍有发热，这可能意味着炎症正在缓解，但并没有完全消失。包块的存在可能提示了局部的炎症变化，但需要进一步的评估。

# 我应该考虑进行腹部超声检查，以评估包块的性质。超声可以帮助确定包块是实体还是虚实体，以及是否有积液、腺体或其他结构异常。同时，检查是否有其他器官受累，如肝脏、胆囊等。

# 如果超声发现阑尾周围有积液，或者有明显的腺体变化，那么可能需要考虑内镜下引流或手术治疗。但如果包块是虚实体，或者伴随有其他结构异常，可能需要进一步的影像学评估，如CT或者MRI。

# 在此同时，药物治疗仍然是主要的治疗手段。抗生素治疗应根据药敏结果选择敏感的药物。如果包块已经形成，可能需要考虑抗生素的选择是否足够覆盖可能的病原体，比如扩谱青霉素、第三代 cephalosporin 或者 fluconazole，如果有怀疑真菌感染的话。

# 此外，考虑到病人已经有5天的发热，可能存在感染的延误或并发症，比如感染性休克、多器官功能障碍等，因此需要密切监测患者的生命体征和各项代谢指标，必要时进行支持治疗。

# 总结一下，处理步骤应该是：

# 1. 进行腹部超声检查，确定包块的性质和周围情况。
# 2. 根据超声结果决定是否需要进一步的影像学检查，如CT或MRI。
# 3. 根据药敏结果选择合适的抗生素，可能需要考虑包块形成的原因，如真菌感染等。
# 4. 监测患者的临床状况，及时调整治疗方案。
# 5. 在必要时，考虑内镜下引流或手术治疗。

# 这样可以全面评估病情，制定合适的治疗方案，确保病人的好转。
# </think>

# 对于一个患有急性阑尾炎的病人，腹痛稍有减轻但仍发热，并在右下腹发现压痛包块，建议的处理步骤如下：

# 1. **腹部超声检查**：首先进行腹部超声，以确定包块的性质。检查是否为实体或虚实体，是否有积液、腺体或其他异常结构。

# 2. **进一步影像学评估**：如果超声结果显示为实体或有明显结构异常，考虑进行CT或MRI以进一步评估包块的性质和周围组织。

# 3. **药物治疗**：根据药敏结果选择敏感的抗生素，如扩谱青霉素、第三代 cephalosporin 或 fluconazole，以覆盖可能的病原体。

# 4. **监测病情**：密切观察患者的生命体征和各项代谢指标，及时调整治疗方案，防止并发症如感染性休克等。

# 5. **手术或内镜引流**：如果包块伴随腺体变化或其他结构异常，必要时考虑内镜下引流或手术治疗。

# 通过以上步骤，可以全面评估病情，制定合适的治疗方案，确保病人的好转。<｜end▁of▁sentence｜>
# (unsloth) (base) ubuntu@ubuntu-server:~/code/unsloth_train$ 
# ########################################################

EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN


# Define a formatting prompts function
def formatting_prompts_func(examples):    
    inputs = examples["Question"]    
    cots = examples["Complex_CoT"]    
    outputs = examples["Response"]    
    texts = []    
    for input, cot, output in zip(inputs, cots, outputs):        
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN        
        texts.append(text)    
    return {        
        "text": texts,    
        }

dataset = dataset.map(formatting_prompts_func, batched=True)

print(dataset)

model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    max_seq_length = max_seq_length,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

trainer = SFTTrainer(
    model = model,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    tokenizer = tokenizer,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 10,
        max_steps = 600,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        output_dir = "outputs",
        optim = "adamw_8bit",
        seed = 3407,
    ),
)
trainer.train()

trainer.save_model("model_20250226")

运行程序，开始训练

python train.py

在这里插入图片描述

训练过程中的GPU使用情况

在这里插入图片描述

训练完成后的目录情况

(unsloth_sft) ubuntu@ubuntu-server:~/train$ tree
.
├── model_20250226
│   ├── adapter_config.json
│   ├── adapter_model.safetensors
│   ├── README.md
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   ├── tokenizer.json
│   └── training_args.bin
├── outputs
│   ├── checkpoint-500
│   │   ├── adapter_config.json
│   │   ├── adapter_model.safetensors
│   │   ├── optimizer.pt
│   │   ├── README.md
│   │   ├── rng_state.pth
│   │   ├── scheduler.pt
│   │   ├── special_tokens_map.json
│   │   ├── tokenizer_config.json
│   │   ├── tokenizer.json
│   │   ├── trainer_state.json
│   │   └── training_args.bin
│   └── checkpoint-600
│       ├── adapter_config.json
│       ├── adapter_model.safetensors
│       ├── optimizer.pt
│       ├── README.md
│       ├── rng_state.pth
│       ├── scheduler.pt
│       ├── special_tokens_map.json
│       ├── tokenizer_config.json
│       ├── tokenizer.json
│       ├── trainer_state.json
│       └── training_args.bin
├── requirements.txt
└── train.py

4 directories, 31 files
(unsloth_sft) ubuntu@ubuntu-server:~/train$

模型预测

原始模型预测

源代码

(unsloth_sft) ubuntu@ubuntu-server:~/train$ cat inference1.py 
from datasets import load_dataset, Dataset
from datasets.features.features import Features

from unsloth import FastLanguageModel 
from unsloth import is_bfloat16_supported
import torch
from trl import SFTTrainer
from transformers import TrainingArguments


max_seq_length = 2048
max_lora_rank = 32
question = "一个患有急性阑尾炎的病人已经发病5天，腹痛稍有减轻但仍然发热，在体检时发现右下腹有压痛的包块，此时应如何处理?"

prompt_style = """
Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
# You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. Please answer the following medical question. 

### Question:
{}

### Response:
# <think>{}"""


# 加载模型
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "/home/ubuntu/model/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length = max_seq_length,
    dtype = None,
    load_in_4bit = True,
    # fast_inference=True,
    # max_lora_rank=max_lora_rank,
    # gpu_memory_utilization=0.6,
)

# 模型预测
FastLanguageModel.for_inference(model=model)

inputs = tokenizer(prompt_style.format(question, ""), return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1300,
    use_cache=True,
)

response = tokenizer.batch_decode(outputs)    

print(response[0].split("### Response:")[1])

(unsloth_sft) ubuntu@ubuntu-server:~/train$

预测结果

(unsloth_sft) ubuntu@ubuntu-server:~/train$ python inference1.py 
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
INFO 02-26 20:20:00 __init__.py:190] Automatically detected platform cuda.
==((====))==  Unsloth 2025.2.4: Fast Llama patching. Transformers: 4.48.3.
   \\   /|    GPU: NVIDIA GeForce RTX 4090. Max memory: 23.635 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 8.9. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.56it/s]

# <think>
嗯，我现在要处理一个急性阑尾炎的病人，这个病人已经发病5天了，腹痛稍微减轻但仍然发热。在体检时发现了右下腹有压痛的包块。首先，我得想想急性阑尾炎的常见症状和处理方法。

急性阑尾炎通常表现为右下腹部的阵发性疼痛，尤其在饭后或空腹时加重，伴随发热、恶心、呕吐，可能还有腹泻。而且，通常是单侧疼痛，但有时候也可能两侧不明显。这个病人已经有五天的发病时间，腹痛虽然减轻，但仍有发热，这可能意味着炎症正在缓解，但仍未完全恢复。

现在，体检发现了右下腹有压痛的包块，这让我有点困惑。急性阑尾炎通常在发病期不明显，或者可能有轻微的触痛，但包块的出现可能意味着炎症已经转移到了某个结构，比如肝或脾，或者是某种并发症，比如脓肿或者其他感染情况。

首先，我需要确认是否确实存在阑尾炎。因为有时候包块可能提示其他问题，比如肝炎、肝脓肿、脾炎、appendicitis的并发症或者其他腹膜炎症。所以，首先要考虑是否需要进一步的影像学检查，比如超声、CT或者MRI，来确认包块的性质。

如果包块是脓肿，那么处理方式会是抗生素治疗，可能需要引流。或者，如果是肝的压痛包块，可能需要处理肝炎。但是，急性阑尾炎通常不会直接导致肝的问题，除非有并发症。

另外，包块的出现可能意味着感染已经扩散，或者是阑尾炎转化为其他形式。所以，除了考虑是否需要进一步的诊断，还要评估病人的整体状况，比如是否有发热、白细胞增高、C反应蛋白升高等。

在这种情况下，我认为应该首先进行全面的体征和实验室检查，包括血常规、C反应蛋白、结核菌素试验、血培养、腹部超声等，以确认是否存在脓肿、肝炎或其他并发症。

如果确诊为脓肿，需要考虑抗生素的选择和引流的可能性。如果有其他并发症，如肝炎或脾炎，也需要相应的治疗。

此外，可能需要考虑是否需要进行肠道外科的评估，尤其是如果包块位于右下腹，可能需要进一步的探查。

同时，考虑到病人的发病5天，腹痛虽有所缓解，但仍有发热，这可能提示炎症正在积累，可能需要继续观察和支持治疗，同时准备进一步的诊断。

总的来说，处理这种情况的关键是先确认诊断，再根据具体情况选择治疗方案，可能需要综合多个方面的医疗专家意见。
</think>

根据病人的症状和检查结果，建议采取以下步骤：

1. **进一步诊断**：进行腹部超声检查以明确包块的性质，确认是否为脓肿、肝炎或其他结构。
2. **实验室检查**：包括血常规、C反应蛋白、结核菌素试验、血培养和肝肾功能测试，以评估感染情况。
3. **抗生素治疗**：根据实验室结果，选择敏感的抗生素开始治疗，覆盖可能的菌落。
4. **影像学引流**：如果确诊为脓肿，考虑腹腔镜下引流或外科手术。
5. **综合治疗**：结合抗生素治疗和支持治疗，如营养补充和疼痛管理。
6. **监测与随访**：定期复查，监测病情进展，评估治疗效果。

此外，建议与多个医疗专家沟通，制定个性化治疗方案。<｜end▁of▁sentence｜>
(unsloth_sft) ubuntu@ubuntu-server:~/train$

微调模型预测

源代码

(unsloth_sft) ubuntu@ubuntu-server:~/train$ cat inference2.py 
from datasets import load_dataset, Dataset
from datasets.features.features import Features

from unsloth import FastLanguageModel 
from unsloth import is_bfloat16_supported
import torch
from trl import SFTTrainer
from transformers import TrainingArguments


max_seq_length = 2048
max_lora_rank = 32
question = "一个患有急性阑尾炎的病人已经发病5天，腹痛稍有减轻但仍然发热，在体检时发现右下腹有压痛的包块，此时应如何处理?"

prompt_style = """
Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
# You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. Please answer the following medical question. 

### Question:
{}

### Response:
# <think>{}"""


# 加载模型
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "./model_20250226",
    max_seq_length = max_seq_length,
    dtype = None,
    load_in_4bit = True,
    # fast_inference=True,
    # max_lora_rank=max_lora_rank,
    # gpu_memory_utilization=0.6,
)

# 模型预测
FastLanguageModel.for_inference(model=model)

inputs = tokenizer(prompt_style.format(question, ""), return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1300,
    use_cache=True,
)

response = tokenizer.batch_decode(outputs)    

print(response[0].split("### Response:")[1])

(unsloth_sft) ubuntu@ubuntu-server:~/train$

预测结果

(unsloth_sft) ubuntu@ubuntu-server:~/train$ python inference2.py 
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
INFO 02-26 20:14:40 __init__.py:190] Automatically detected platform cuda.
==((====))==  Unsloth 2025.2.4: Fast Llama patching. Transformers: 4.48.3.
   \\   /|    GPU: NVIDIA GeForce RTX 4090. Max memory: 23.635 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 8.9. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.57it/s]
Unsloth 2025.2.4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.

# <think>
哦，这个病人已经有5天的急性阑尾炎了，腹痛稍微减轻，但还是发烧。体检发现右下腹有压痛的包块，这听起来很像是阑尾炎的典型表现。哦，可能是因为阑尾炎引起了炎症，导致了包块。

我记得急性阑尾炎一般不需要立即手术处理，通常是观察和药物治疗。不过，这个病人的情况有点复杂，包块的存在可能意味着炎症已经发展到某种程度。

要是包块比较大，或者有明显的压痛，可能会需要考虑手术。毕竟，如果包块太大，可能会压迫到其他重要器官，比如肾脏，这就麻烦了。

不过，等等，虽然包块存在，但病人还没有完全无痛的症状。腹痛稍微减轻但仍然发烧，这可能只是炎症的发展阶段，还没有到需要手术的阶段。

所以，或许我们可以先观察一下，等病情进一步发展或稳定后，再考虑手术。同时，继续使用抗生素和止痛药来缓解症状。

这样想来，先不急着手术，继续观察和治疗似乎是更合理的选择。希望病人能够好转，避免不必要的手术风险。
</think>
根据病人的症状和体检结果，虽然右下腹存在压痛的包块，但目前病人的症状还没有完全稳定，腹痛仅有轻微减轻，仍然有发热。这种情况下，通常不需要立即进行手术治疗。通常，急性阑尾炎可以通过观察、药物治疗和营养支持来缓解症状。然而，包块的存在可能提示炎症已经发展到一定程度，需要密切监测。

在这种情况下，建议继续使用抗生素和止痛药，以缓解症状，并密切观察病情的变化。如果病情进一步恶化，比如包块增大、腹痛加重或伴随发热不退、呕吐、发黄等症状，可能需要考虑进一步的处理，包括影像学检查或手术。

总之，目前的处理策略是继续观察和治疗，同时密切监测病情变化，以便及时调整治疗方案。希望病人能够尽快恢复健康。<｜end▁of▁sentence｜>
(unsloth_sft) ubuntu@ubuntu-server:~/train$