Qwen2模型Text2SQL微调​以及GGUF量化

news2025/1/15 20:39:46

Qwen2-1.5B微调

准备python环境

conda create --name llama_factory python=3.11
conda activate llama_factory

部署llama-factory

git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip3 install -e ".[torch,metrics]"
# 如果要在 Windows 平台上开启量化 LoRA(QLoRA),需要安装预编译的 bitsandbytes 库
pip3 install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.2.post2-py3-none-win_amd64.whl
# 安装pytorch
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
# 如启动报错,出现ImportError: cannot import name 'get_full_repo_name' from 'huggingface_hub',需安装chardet
pip3 install chardet

执行python src/webui.py​启动:
在这里插入图片描述

Text2SQL微调

(1)准备样本数据集,如:

{
    "db_id": "department_management",
    "instruction": "I want you to act as a SQL terminal in front of an example database, you need only to return the sql command to me.Below is an instruction that describes a task, Write a response that appropriately completes the request.\n\"\n##Instruction:\ndepartment_management contains tables such as department, head, management. Table department has columns such as Department_ID, Name, Creation, Ranking, Budget_in_Billions, Num_Employees. Department_ID is the primary key.\nTable head has columns such as head_ID, name, born_state, age. head_ID is the primary key.\nTable management has columns such as department_ID, head_ID, temporary_acting. department_ID is the primary key.\nThe head_ID of management is the foreign key of head_ID of head.\nThe department_ID of management is the foreign key of Department_ID of department.\n\n",
    "input": "###Input:\nHow many heads of the departments are older than 56 ?\n\n###Response:",
    "output": "SELECT count(*) FROM head WHERE age  >  56",
    "history": []
}

(2)添加数据集。将数据集json文件复制到LLaMA-Factory/data目录下,在dataset_info.json中添加如下内容:

"text2sql_train": {
    "file_name": "text2sql_train.json",
    "columns": {
      "prompt": "instruction",
      "query": "input",
      "response": "output",
      "history": "history"
    }
},
"text2sql_dev": {
    "file_name": "text2sql_dev.json",
    "columns": {
      "prompt": "instruction",
      "query": "input",
      "response": "output",
      "history": "history"
    }
}

(3)配置llama-factory页面上的各参数,预览命令如下(根据自己的实际情况调整参数):

llamafactory-cli train `
    --stage sft `
    --do_train True `
    --model_name_or_path D:\\LLM\\Qwen2-1.5B-Instruct `
    --preprocessing_num_workers 16 `
    --finetuning_type lora `
    --quantization_method bitsandbytes `
    --template qwen `
    --flash_attn auto `
    --dataset_dir D:\\python_project\\LLaMA-Factory\\data `
    --dataset text2sql_train `
    --cutoff_len 1024 `
    --learning_rate 0.0001 `
    --num_train_epochs 2.0 `
    --max_samples 100000 `
    --per_device_train_batch_size 2 `
    --gradient_accumulation_steps 4 `
    --lr_scheduler_type cosine `
    --max_grad_norm 1.0 `
    --logging_steps 20 `
    --save_steps 500 `
    --warmup_steps 0 `
    --optim adamw_torch `
    --packing False `
    --report_to none `
    --output_dir saves\Qwen2-1.5B-Chat\lora\train_2024-07-19-19-45-59 `
    --bf16 True `
    --plot_loss True `
    --ddp_timeout 180000000 `
    --include_num_input_tokens_seen True `
    --lora_rank 8 `
    --lora_alpha 16 `
    --lora_dropout 0 `
    --lora_target all

(4)微调后,对测试集数据进行推理评估:

在这里插入图片描述

{
    "predict_bleu-4": 88.43791015473889,
    "predict_rouge-1": 92.31425483558995,
    "predict_rouge-2": 85.43010570599614,
    "predict_rouge-l": 89.06327794970986,
    "predict_runtime": 1027.4111,
    "predict_samples_per_second": 1.006,
    "predict_steps_per_second": 0.503
}

从以上的指标数据看,模型的效果还是挺不错的,大家可以基于自己的样本数据自行设置各个参数。以下是以上的评估指标解读:

1. predict_bleu-4:
    * BLEU(Bilingual Evaluation Understudy)是一种常用的用于评估机器翻译质量的指标。
    * BLEU-4 表示四元语法 BLEU 分数,它衡量模型生成文本与参考文本之间的 n-gram 匹配程度,其中 n=4。
    * 值越高表示生成的文本与参考文本越相似,最大值为 1002. predict_rouge-1 和 predict_rouge-2:
    * ROUGE(Recall-Oriented Understudy for Gisting Evaluation)是一种用于评估自动摘要和文本生成模型性能的指标。
    * ROUGE-1 表示一元 ROUGE 分数,ROUGE-2 表示二元 ROUGE 分数,分别衡量模型生成文本与参考文本之间的单个词和双词序列的匹配程度。
    * 值越高表示生成的文本与参考文本越相似,最大值为 1003. predict_rouge-l:
    * ROUGE-L 衡量模型生成文本与参考文本之间最长公共子序列(Longest Common Subsequence)的匹配程度。
    * 值越高表示生成的文本与参考文本越相似,最大值为 1004. predict_runtime:
    * 预测运行时间,表示模型生成一批样本所花费的总时间。
    * 单位通常为秒。

5. predict_samples_per_second:
    * 每秒生成的样本数量,表示模型每秒钟能够生成的样本数量。
    * 通常用于评估模型的推理速度。

6. predict_steps_per_second:
    * 每秒执行的步骤数量,表示模型每秒钟能够执行的步骤数量。
    * 对于生成模型,一般指的是每秒钟执行生成操作的次数。

(5)评估感觉达到了微调预期,可以直接导出微调后的模型(注意选上微调生成的检查点):

在这里插入图片描述

GGUF模型

以上微调导出的是safetensors模型,我们可以使用llama.cpp将safetensors模型转为GGUF模型。推荐使用w64devkit+make方案部署llama.cpp。

安装make

下载地址:https://gnuwin32.sourceforge.net/packages/make.htm

在这里插入图片描述

点击框中的下载,下载安装后,把安装路径添加到环境变量PATH中。在终端,执行以下命令,将出现Make版本信息:

(llama_factory) D:\LLM\llama.cpp>make -v
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for i386-pc-mingw32

安装MinGW

下载源文件,把mingw64/bin添加到环境变量PATH中。

在这里插入图片描述

安装w64devkit

下载地址:https://github.com/skeeto/w64devkit/releases

下载 w64devkit-fortran-1.23.0.zip后解压。注意:不要包含中文路径​。

部署llama.cpp

下载地址:https://github.com/ggerganov/llama.cpp

使用我们刚刚下载的w64devkit.exe打开llama.cpp,然后make,成功后就能得到一堆exe文件啦。

~ $ cd D:
D:/ $ cd /LLM/llama.cpp
D:/LLM/llama.cpp $ make

make成功后,安装python依赖:

conda create --name llama_cpp python=3.11
conda activate llama_cpp
pip3 install -r requirements.txt

转换为GGUF FP16格式

# model_path/mymodel为我们实际的模型路径
python convert_hf_to_gguf.py model_path/mymodel

运行日志:

(llama_cpp) D:\LLM\llama.cpp>python convert_hf_to_gguf.py D:/python_project/LLaMA-Factory/output_model/model2
INFO:hf-to-gguf:Loading model: model2
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00002.safetensors'
INFO:hf-to-gguf:token_embd.weight,         torch.bfloat16 --> F16, shape = {1536, 151936}
INFO:hf-to-gguf:blk.0.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.0.ffn_down.weight,     torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.0.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.0.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.0.attn_k.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.0.attn_k.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.0.attn_output.weight,  torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.0.attn_q.bias,         torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.0.attn_q.weight,       torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.0.attn_v.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.0.attn_v.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.1.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.1.ffn_down.weight,     torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.1.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.1.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.1.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.1.attn_k.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.1.attn_k.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.1.attn_output.weight,  torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.1.attn_q.bias,         torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.1.attn_q.weight,       torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.1.attn_v.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.1.attn_v.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.10.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.10.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.10.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.10.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.10.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.10.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.10.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.10.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.10.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.10.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.10.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.10.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.11.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.11.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.11.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.11.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.11.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.11.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.11.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.11.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.11.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.11.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.11.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.11.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.12.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.12.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.12.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.12.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.12.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.12.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.12.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.12.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.12.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.12.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.12.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.12.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.13.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.13.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.13.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.13.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.13.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.13.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.13.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.13.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.13.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.13.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.13.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.13.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.14.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.14.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.14.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.14.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.14.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.14.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.14.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.14.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.14.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.14.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.14.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.14.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.15.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.15.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.15.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.15.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.15.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.15.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.15.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.15.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.15.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.15.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.15.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.15.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.16.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.16.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.16.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.16.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.16.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.16.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.16.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.2.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.2.ffn_down.weight,     torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.2.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.2.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.2.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.2.attn_k.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.2.attn_k.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.2.attn_output.weight,  torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.2.attn_q.bias,         torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.2.attn_q.weight,       torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.2.attn_v.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.2.attn_v.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.3.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.3.ffn_down.weight,     torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.3.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.3.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.3.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.3.attn_k.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.3.attn_k.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.3.attn_output.weight,  torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.3.attn_q.bias,         torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.3.attn_q.weight,       torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.3.attn_v.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.3.attn_v.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.4.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.4.ffn_down.weight,     torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.4.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.4.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.4.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.4.attn_k.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.4.attn_k.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.4.attn_output.weight,  torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.4.attn_q.bias,         torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.4.attn_q.weight,       torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.4.attn_v.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.4.attn_v.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.5.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.5.ffn_down.weight,     torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.5.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.5.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.5.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.5.attn_k.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.5.attn_k.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.5.attn_output.weight,  torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.5.attn_q.bias,         torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.5.attn_q.weight,       torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.5.attn_v.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.5.attn_v.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.6.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.6.ffn_down.weight,     torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.6.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.6.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.6.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.6.attn_k.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.6.attn_k.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.6.attn_output.weight,  torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.6.attn_q.bias,         torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.6.attn_q.weight,       torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.6.attn_v.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.6.attn_v.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.7.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.7.ffn_down.weight,     torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.7.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.7.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.7.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.7.attn_k.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.7.attn_k.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.7.attn_output.weight,  torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.7.attn_q.bias,         torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.7.attn_q.weight,       torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.7.attn_v.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.7.attn_v.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.8.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.8.ffn_down.weight,     torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.8.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.8.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.8.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.8.attn_k.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.8.attn_k.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.8.attn_output.weight,  torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.8.attn_q.bias,         torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.8.attn_q.weight,       torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.8.attn_v.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.8.attn_v.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.9.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.9.ffn_down.weight,     torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.9.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.9.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.9.attn_k.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.9.attn_k.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.9.attn_output.weight,  torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.9.attn_q.bias,         torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.9.attn_q.weight,       torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.9.attn_v.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.9.attn_v.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:gguf: loading model part 'model-00002-of-00002.safetensors'
INFO:hf-to-gguf:blk.16.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.16.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.16.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.16.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.16.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.17.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.17.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.17.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.17.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.17.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.17.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.17.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.17.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.17.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.17.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.17.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.17.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.18.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.18.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.18.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.18.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.18.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.18.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.18.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.18.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.18.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.18.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.18.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.18.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.19.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.19.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.19.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.19.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.19.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.19.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.19.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.19.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.19.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.19.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.19.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.19.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.20.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.20.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.20.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.20.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.20.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.20.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.20.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.20.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.20.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.20.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.20.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.20.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.21.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.21.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.21.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.21.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.21.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.21.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.21.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.21.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.21.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.21.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.21.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.21.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.22.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.22.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.22.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.22.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.22.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.22.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.22.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.22.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.22.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.22.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.22.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.22.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.23.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.23.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.23.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.23.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.23.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.23.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.23.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.23.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.23.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.23.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.23.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.23.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.24.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.24.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.24.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.24.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.24.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.24.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.24.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.24.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.24.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.24.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.24.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.24.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.25.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.25.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.25.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.25.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.25.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.25.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.25.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.25.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.25.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.25.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.25.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.25.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.26.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.26.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.26.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.26.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.26.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.26.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.26.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.26.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.26.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.26.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.26.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.26.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.27.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.27.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.27.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.27.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.27.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.27.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.27.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.27.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.27.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.27.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.27.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.27.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:output_norm.weight,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 32768
INFO:hf-to-gguf:gguf: embedding length = 1536
INFO:hf-to-gguf:gguf: feed forward length = 8960
INFO:hf-to-gguf:gguf: head count = 12
INFO:hf-to-gguf:gguf: key-value head count = 2
INFO:hf-to-gguf:gguf: rope theta = 1000000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO:gguf.vocab:Adding 151387 merge(s).
INFO:gguf.vocab:Setting special token type eos to 151645
INFO:gguf.vocab:Setting special token type pad to 151643
INFO:gguf.vocab:Setting special token type bos to 151643
INFO:gguf.vocab:Setting chat_template to {% set system_message = 'You are a helpful assistant.' %}{% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] %}{% endif %}{% if system_message is defined %}{{ '<|im_start|>system
' + system_message + '<|im_end|>
' }}{% endif %}{% for message in messages %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ '<|im_start|>user
' + content + '<|im_end|>
<|im_start|>assistant
' }}{% elif message['role'] == 'assistant' %}{{ content + '<|im_end|>' + '
' }}{% endif %}{% endfor %}
INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:D:\Llm\Qwen2-1.5B-Instruct-F16.gguf: n_tensors = 338, total_size = 3.1G
Writing: 100%|██████████████████████████████████████████████████████████████████| 3.09G/3.09G [00:14<00:00, 218Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to D:\Llm\Qwen2-1.5B-Instruct-F16.gguf

模型量化

# model_path/mymodel-F16.gguf为刚刚得到的FP16模型,model_path/mymodel_Q4_k_M.gguf为要得到的量化模型,Q4_K_M为使用Q4_K_M方法
llama-quantize.exe model_path/mymodel-F16.gguf model_path/mymodel_Q4_k_M.gguf Q4_K_M

运行日志:

(llama_cpp) D:\LLM\llama.cpp>llama-quantize.exe D:/Llm/Qwen2-1.5B-Instruct-F16.gguf D:/Llm/Qwen2-1.5B-Instruct_Q4_k_M.gguf Q4_K_M
main: build = 0 (unknown)
main: built with cc (GCC) 14.1.0 for x86_64-w64-mingw32
main: quantizing 'D:/Llm/Qwen2-1.5B-Instruct-F16.gguf' to 'D:/Llm/Qwen2-1.5B-Instruct_Q4_k_M.gguf' as Q4_K_M
llama_model_loader: loaded meta data with 25 key-value pairs and 338 tensors from D:/Llm/Qwen2-1.5B-Instruct-F16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = D:\\LLM\\Qwen2 1.5B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = D:\\LLM\\Qwen2
llama_model_loader: - kv   5:                         general.size_label str              = 1.5B
llama_model_loader: - kv   6:                          qwen2.block_count u32              = 28
llama_model_loader: - kv   7:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv   8:                     qwen2.embedding_length u32              = 1536
llama_model_loader: - kv   9:                  qwen2.feed_forward_length u32              = 8960
llama_model_loader: - kv  10:                 qwen2.attention.head_count u32              = 12
llama_model_loader: - kv  11:              qwen2.attention.head_count_kv u32              = 2
llama_model_loader: - kv  12:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  13:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  14:                          general.file_type u32              = 1
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  16:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  17:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:                      tokenizer.ggml.merges arr[str,151387]  = ["臓 臓", "臓臓 臓臓", "i n", "臓 t",...
llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  22:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  23:                    tokenizer.chat_template str              = {% set system_message = 'You are a he...
llama_model_loader: - kv  24:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  141 tensors
llama_model_loader: - type  f16:  197 tensors
[   1/ 338]                    token_embd.weight - [ 1536, 151936,     1,     1], type =    f16, converting to q6_K .. size =   445.12 MiB ->   182.57 MiB
[   2/ 338]               blk.0.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[   3/ 338]                blk.0.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[   4/ 338]                blk.0.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[   5/ 338]                  blk.0.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[   6/ 338]                blk.0.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[   7/ 338]                    blk.0.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[   8/ 338]                  blk.0.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[   9/ 338]             blk.0.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  10/ 338]                    blk.0.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  11/ 338]                  blk.0.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  12/ 338]                    blk.0.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  13/ 338]                  blk.0.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[  14/ 338]               blk.1.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  15/ 338]                blk.1.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[  16/ 338]                blk.1.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  17/ 338]                  blk.1.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  18/ 338]                blk.1.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  19/ 338]                    blk.1.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  20/ 338]                  blk.1.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  21/ 338]             blk.1.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  22/ 338]                    blk.1.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  23/ 338]                  blk.1.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  24/ 338]                    blk.1.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  25/ 338]                  blk.1.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[  26/ 338]              blk.10.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  27/ 338]               blk.10.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[  28/ 338]               blk.10.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  29/ 338]                 blk.10.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  30/ 338]               blk.10.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  31/ 338]                   blk.10.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  32/ 338]                 blk.10.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  33/ 338]            blk.10.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  34/ 338]                   blk.10.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  35/ 338]                 blk.10.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  36/ 338]                   blk.10.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  37/ 338]                 blk.10.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[  38/ 338]              blk.11.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  39/ 338]               blk.11.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  40/ 338]               blk.11.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  41/ 338]                 blk.11.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  42/ 338]               blk.11.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  43/ 338]                   blk.11.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  44/ 338]                 blk.11.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  45/ 338]            blk.11.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  46/ 338]                   blk.11.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  47/ 338]                 blk.11.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  48/ 338]                   blk.11.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  49/ 338]                 blk.11.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  50/ 338]              blk.12.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  51/ 338]               blk.12.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  52/ 338]               blk.12.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  53/ 338]                 blk.12.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  54/ 338]               blk.12.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  55/ 338]                   blk.12.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  56/ 338]                 blk.12.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  57/ 338]            blk.12.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  58/ 338]                   blk.12.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  59/ 338]                 blk.12.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  60/ 338]                   blk.12.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  61/ 338]                 blk.12.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  62/ 338]              blk.13.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  63/ 338]               blk.13.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[  64/ 338]               blk.13.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  65/ 338]                 blk.13.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  66/ 338]               blk.13.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  67/ 338]                   blk.13.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  68/ 338]                 blk.13.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  69/ 338]            blk.13.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  70/ 338]                   blk.13.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  71/ 338]                 blk.13.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  72/ 338]                   blk.13.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  73/ 338]                 blk.13.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[  74/ 338]              blk.14.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  75/ 338]               blk.14.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  76/ 338]               blk.14.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  77/ 338]                 blk.14.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  78/ 338]               blk.14.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  79/ 338]                   blk.14.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  80/ 338]                 blk.14.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  81/ 338]            blk.14.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  82/ 338]                   blk.14.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  83/ 338]                 blk.14.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  84/ 338]                   blk.14.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  85/ 338]                 blk.14.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  86/ 338]              blk.15.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  87/ 338]               blk.15.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  88/ 338]               blk.15.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  89/ 338]                 blk.15.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  90/ 338]               blk.15.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  91/ 338]                   blk.15.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  92/ 338]                 blk.15.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  93/ 338]            blk.15.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  94/ 338]                   blk.15.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  95/ 338]                 blk.15.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  96/ 338]                   blk.15.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  97/ 338]                 blk.15.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  98/ 338]                   blk.16.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  99/ 338]                 blk.16.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 100/ 338]            blk.16.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 101/ 338]                   blk.16.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 102/ 338]                 blk.16.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 103/ 338]                   blk.16.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 104/ 338]                 blk.16.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[ 105/ 338]               blk.2.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 106/ 338]                blk.2.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[ 107/ 338]                blk.2.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 108/ 338]                  blk.2.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 109/ 338]                blk.2.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 110/ 338]                    blk.2.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 111/ 338]                  blk.2.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 112/ 338]             blk.2.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 113/ 338]                    blk.2.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 114/ 338]                  blk.2.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 115/ 338]                    blk.2.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 116/ 338]                  blk.2.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 117/ 338]               blk.3.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 118/ 338]                blk.3.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 119/ 338]                blk.3.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 120/ 338]                  blk.3.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 121/ 338]                blk.3.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 122/ 338]                    blk.3.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 123/ 338]                  blk.3.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 124/ 338]             blk.3.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 125/ 338]                    blk.3.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 126/ 338]                  blk.3.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 127/ 338]                    blk.3.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 128/ 338]                  blk.3.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 129/ 338]               blk.4.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 130/ 338]                blk.4.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 131/ 338]                blk.4.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 132/ 338]                  blk.4.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 133/ 338]                blk.4.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 134/ 338]                    blk.4.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 135/ 338]                  blk.4.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 136/ 338]             blk.4.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 137/ 338]                    blk.4.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 138/ 338]                  blk.4.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 139/ 338]                    blk.4.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 140/ 338]                  blk.4.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[ 141/ 338]               blk.5.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 142/ 338]                blk.5.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[ 143/ 338]                blk.5.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 144/ 338]                  blk.5.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 145/ 338]                blk.5.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 146/ 338]                    blk.5.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 147/ 338]                  blk.5.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 148/ 338]             blk.5.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 149/ 338]                    blk.5.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 150/ 338]                  blk.5.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 151/ 338]                    blk.5.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 152/ 338]                  blk.5.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 153/ 338]               blk.6.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 154/ 338]                blk.6.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 155/ 338]                blk.6.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 156/ 338]                  blk.6.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 157/ 338]                blk.6.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 158/ 338]                    blk.6.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 159/ 338]                  blk.6.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 160/ 338]             blk.6.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 161/ 338]                    blk.6.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 162/ 338]                  blk.6.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 163/ 338]                    blk.6.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 164/ 338]                  blk.6.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 165/ 338]               blk.7.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 166/ 338]                blk.7.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 167/ 338]                blk.7.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 168/ 338]                  blk.7.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 169/ 338]                blk.7.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 170/ 338]                    blk.7.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 171/ 338]                  blk.7.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 172/ 338]             blk.7.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 173/ 338]                    blk.7.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 174/ 338]                  blk.7.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 175/ 338]                    blk.7.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 176/ 338]                  blk.7.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[ 177/ 338]               blk.8.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 178/ 338]                blk.8.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[ 179/ 338]                blk.8.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 180/ 338]                  blk.8.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 181/ 338]                blk.8.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 182/ 338]                    blk.8.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 183/ 338]                  blk.8.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 184/ 338]             blk.8.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 185/ 338]                    blk.8.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 186/ 338]                  blk.8.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 187/ 338]                    blk.8.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 188/ 338]                  blk.8.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 189/ 338]               blk.9.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 190/ 338]                blk.9.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 191/ 338]                blk.9.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 192/ 338]                  blk.9.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 193/ 338]                blk.9.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 194/ 338]                    blk.9.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 195/ 338]                  blk.9.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 196/ 338]             blk.9.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 197/ 338]                    blk.9.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 198/ 338]                  blk.9.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 199/ 338]                    blk.9.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 200/ 338]                  blk.9.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 201/ 338]              blk.16.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 202/ 338]               blk.16.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 203/ 338]               blk.16.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 204/ 338]                 blk.16.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 205/ 338]               blk.16.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 206/ 338]              blk.17.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 207/ 338]               blk.17.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[ 208/ 338]               blk.17.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 209/ 338]                 blk.17.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 210/ 338]               blk.17.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 211/ 338]                   blk.17.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 212/ 338]                 blk.17.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 213/ 338]            blk.17.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 214/ 338]                   blk.17.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 215/ 338]                 blk.17.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 216/ 338]                   blk.17.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 217/ 338]                 blk.17.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[ 218/ 338]              blk.18.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 219/ 338]               blk.18.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 220/ 338]               blk.18.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 221/ 338]                 blk.18.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 222/ 338]               blk.18.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 223/ 338]                   blk.18.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 224/ 338]                 blk.18.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 225/ 338]            blk.18.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 226/ 338]                   blk.18.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 227/ 338]                 blk.18.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 228/ 338]                   blk.18.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 229/ 338]                 blk.18.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 230/ 338]              blk.19.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 231/ 338]               blk.19.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 232/ 338]               blk.19.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 233/ 338]                 blk.19.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 234/ 338]               blk.19.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 235/ 338]                   blk.19.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 236/ 338]                 blk.19.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 237/ 338]            blk.19.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 238/ 338]                   blk.19.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 239/ 338]                 blk.19.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 240/ 338]                   blk.19.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 241/ 338]                 blk.19.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 242/ 338]              blk.20.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 243/ 338]               blk.20.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[ 244/ 338]               blk.20.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 245/ 338]                 blk.20.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 246/ 338]               blk.20.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 247/ 338]                   blk.20.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 248/ 338]                 blk.20.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 249/ 338]            blk.20.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 250/ 338]                   blk.20.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 251/ 338]                 blk.20.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 252/ 338]                   blk.20.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 253/ 338]                 blk.20.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[ 254/ 338]              blk.21.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 255/ 338]               blk.21.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 256/ 338]               blk.21.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 257/ 338]                 blk.21.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 258/ 338]               blk.21.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 259/ 338]                   blk.21.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 260/ 338]                 blk.21.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 261/ 338]            blk.21.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 262/ 338]                   blk.21.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 263/ 338]                 blk.21.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 264/ 338]                   blk.21.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 265/ 338]                 blk.21.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 266/ 338]              blk.22.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 267/ 338]               blk.22.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 268/ 338]               blk.22.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 269/ 338]                 blk.22.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 270/ 338]               blk.22.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 271/ 338]                   blk.22.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 272/ 338]                 blk.22.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 273/ 338]            blk.22.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 274/ 338]                   blk.22.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 275/ 338]                 blk.22.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 276/ 338]                   blk.22.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 277/ 338]                 blk.22.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 278/ 338]              blk.23.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 279/ 338]               blk.23.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[ 280/ 338]               blk.23.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 281/ 338]                 blk.23.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 282/ 338]               blk.23.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 283/ 338]                   blk.23.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 284/ 338]                 blk.23.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 285/ 338]            blk.23.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 286/ 338]                   blk.23.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 287/ 338]                 blk.23.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 288/ 338]                   blk.23.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 289/ 338]                 blk.23.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[ 290/ 338]              blk.24.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 291/ 338]               blk.24.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[ 292/ 338]               blk.24.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 293/ 338]                 blk.24.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 294/ 338]               blk.24.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 295/ 338]                   blk.24.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 296/ 338]                 blk.24.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 297/ 338]            blk.24.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 298/ 338]                   blk.24.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 299/ 338]                 blk.24.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 300/ 338]                   blk.24.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 301/ 338]                 blk.24.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[ 302/ 338]              blk.25.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 303/ 338]               blk.25.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[ 304/ 338]               blk.25.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 305/ 338]                 blk.25.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 306/ 338]               blk.25.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 307/ 338]                   blk.25.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 308/ 338]                 blk.25.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 309/ 338]            blk.25.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 310/ 338]                   blk.25.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 311/ 338]                 blk.25.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 312/ 338]                   blk.25.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 313/ 338]                 blk.25.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[ 314/ 338]              blk.26.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 315/ 338]               blk.26.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[ 316/ 338]               blk.26.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 317/ 338]                 blk.26.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 318/ 338]               blk.26.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 319/ 338]                   blk.26.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 320/ 338]                 blk.26.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 321/ 338]            blk.26.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 322/ 338]                   blk.26.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 323/ 338]                 blk.26.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 324/ 338]                   blk.26.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 325/ 338]                 blk.26.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[ 326/ 338]              blk.27.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 327/ 338]               blk.27.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[ 328/ 338]               blk.27.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 329/ 338]                 blk.27.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 330/ 338]               blk.27.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 331/ 338]                   blk.27.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 332/ 338]                 blk.27.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 333/ 338]            blk.27.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 334/ 338]                   blk.27.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 335/ 338]                 blk.27.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 336/ 338]                   blk.27.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 337/ 338]                 blk.27.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[ 338/ 338]                   output_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
llama_model_quantize_internal: model size  =  2944.68 MB
llama_model_quantize_internal: quant size  =   934.69 MB

main: quantize time = 14228.85 ms
main:    total time = 14228.85 ms

最后,得到了Qwen2-1.5B-Instruct_Q4_k_M.gguf​模型。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1956581.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

【全面讲解下Docker in Docker的原理与实践】

🌈个人主页: 程序员不想敲代码啊 🏆CSDN优质创作者,CSDN实力新星,CSDN博客专家 👍点赞⭐评论⭐收藏 🤝希望本文对您有所裨益,如有不足之处,欢迎在评论区提出指正,让我们共同学习、交流进步! 👉目录 👉前言👉原理👉实践👉安全和最佳实践👉前言 🦛…

【保姆级教程】免费内网穿透,手把手搭建,三步搞定

在内网部署的一个应用&#xff0c;想分享给外网的小伙伴玩玩&#xff1f; 学校实验室有一台高性能服务器&#xff0c;在外网就无法使用&#xff1f; 来吧&#xff0c;内网穿透&#xff0c;了解一下&#xff1f; 1. 关于内网穿透 1.1 什么是内网穿透 且看百度百科的说法&am…

【AI人工智能】文心智能体,00后疯感工牌生成器,低代码工作流的简单应用以及图片快速响应解决方案,干货满满,不容错过哦

背景 文心智能体平台&#xff0c;开启新一轮活动&#xff0c;超级创造营持续百日活动。 在AI 浪潮席卷的今天&#xff0c;如雨后春笋般丛生的 AI 应用&#xff0c;昭告着时代风口显然已随之到来。 如何能把握住时代红利&#xff0c;占据风口&#xff0c;甚至打造新风向&#x…

深入探讨单元测试:概念、作用及应用实例

目录 前言1. 单元测试的概念1.1 单元测试的特点1.2 单元测试的组成部分 2. 单元测试的主要作用2.1 确保代码正确性2.2 提高代码质量2.3 促进重构2.4 支持持续集成 3. 单元测试在整个测试中的地位3.1 单元测试 vs 集成测试3.2 单元测试 vs 系统测试3.3 单元测试 vs 验收测试 4. …

VulnHub靶机入门篇--Kioptrix4

1.环境配置 下载地址&#xff1a; https://download.vulnhub.com/kioptrix/Kioptrix4_vmware.rar 下载完解压之后是一个vdmk文件&#xff0c;我们需要先创建一个新的虚拟机&#xff0c;将vdmk文件导入就行了 先移除原先硬盘&#xff0c;然后再进行添加&#xff0c;网络连接为…

Computer Software Copyright Registration Certificate

Computer Software Copyright Registration Certificate 计算机软件著作权登记证书

项目实战--JUC之CompletableFuture温故

CompletableFuture温故 一、前言二、Future三、CompletableFuture3.1 CompletableFuture定义3.2 CompletableFuture使用场景3.3 CompletableFuture 常见操作3.3.1 创建CompletableFuture3.3.2 使用CompletableFuture3.3.3 异常处理3.3.4 注意事项 四、CompletableFuture处理工具…

51单片机嵌入式开发:21、STC89C52R控制抢答器+数码管+后台显示+LCD1602x显示

配套程序 STC89C52R控制抢答器数码管后台显示LCD1602x显示 STC89C52R控制抢答器数码管后台显示LCD1602x显示1 概述1.1 项目概述1.2 项目组成部分1.3 功能描述 2 开发环境2.1 支持设备2.2 硬件电路 3 软件代码工程4 演示Proteus仿真5 总结 配套程序 STC89C52R控制抢答器数码管后…

oracle登录报“ORA-27101: shared memory realm does not exist”

oracle登录报“ORA-27101: shared memory realm does not exist” 问题&#xff1a; 1、使用ip:1521/服务名方式连库报错" ORA-27101: shared memory realm does not exist Linux-x86_64 Error: 2: No such file or directory" 2、sqlplus XX/密码 可以登录数据库 …

linux下使用yum安装mysql

本文使用常规方式手动安装mysql 第一步 下载mysql的repo源 wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm第二步 安装mysql-community-release-el7-5.noarch.rpm包 rpm -ivh mysql-community-release-el7-5.noarch.rpm第三步 安装mysql-server yum -y…

一文步让成功搭建AI数字人直播系统!

随着AI数字人直播技术的不断成熟&#xff0c;越来越多的企业都开始使用或计划引入AI数字人直播&#xff0c;来实现直播板块的降本增效。这也让不少创业者看到了AI数字人直播所蕴含着的巨大市场需求和广阔收益空间&#xff0c;从而有了搭建AI数字人直播系统源码的想法。 ​不过&…

flask学习小结

背景 通过官方文档学习 quickstart 第一个app 代码 文件命名为hello.py windows执行set FLASK_APPhello&#xff0c;linux执行export FLASK_APPhello&#xff0c;然后执行flask run即可运行app 默认运行的app是127.0.0.1:5000&#xff0c;可通过-h -p参数改ip地址和端口。…

windows下定时执行bat脚本删除无用文件和文件夹

文章目录 1. 删除脚本编写2. 配置任务计划程序1.搜索任务计划程序2. 创建任务3. 触发器配置4. 创建任务–>操作5. 创建任务–>条件6. 创建任务–>设置7. 确定保存设置 1. 删除脚本编写 echo off REM 设置源目录和要删除的文件的天数阈值 SrcDirE:\xxx DaysAgo30REM 从…

二阶段测试

二阶段测试 1、部署框架前准备工作 服务器类型部署组件ip地址DR1调度服务器 主&#xff08;ha01&#xff09;KeepalivedLVS-DR192.168.168.21DR2调度服务器 备 (ha02)KeepalivedLVS-DR192.168.168.22web1节点服务器 (slave01)NginxTomcatMySQL 备MHA managerMHA node192.168.1…

【OpenCV C++20 学习笔记】图片融合

图片融合 原理实现结果展示完整代码 原理 关于OpenCV的配置和基础用法&#xff0c;请参阅本专栏的其他文章&#xff1a;垚武田的OpenCV合集 这里采用的图片熔合的算法来自Richard Szeliski的书《Computer Vision: Algorithms and Applications》&#xff08;《计算机视觉&#…

【C++】循环结构-for循环

for 循环语法格式&#xff1a; for(起始表达式;条件表达式;末尾循环体) {循环语句} 注意&#xff1a;括号内的三者也可以写出来写在循环语句里&#xff0c;但使用for循环就是为了使循环更加简洁明了&#xff0c;因此不仅以这么做 下面是一个实例 #include<iostream> …

ROS安装key NO_PUBKEY F42ED6FBAB17C654

解决方案 可以手工添加。 ROS1云课→18一键配置_linux ros1配置-CSDN博客 下载ros.key&#xff0c;然后手工添加到对应位置。 ros2407.key master zhangrelay / ros_book GitCode 问题描述 ros2ros2-20l1a001cd:~$ sudo apt update Hit:1 http://ftp.sjtu.edu.cn/ubunt…

010 仿muduo实现高性能服务器组件_Http协议模块

​&#x1f308;个人主页&#xff1a;Fan_558 &#x1f525; 系列专栏&#xff1a;仿muduo &#x1f4d2;代码仓库&#xff1a; 项目代码 &#x1f339;关注我&#x1f4aa;&#x1f3fb;带你学更多知识 文章目录 前言Util模块设计意义整体设计代码如下 HttpRequest模块代码如下…

初始K8s

K8S 基本概念: K8S 的全称为 Kubernetes (K12345678S)&#xff0c;PS&#xff1a;“嘛&#xff0c;写全称也太累了吧&#xff0c;不如整个缩写”。 作用&#xff1a; 用于自动部署、扩展和管理“容器化&#xff08;containerized&#xff09;应用程序”的开源系统。 可以理解成…

web自动化6-pytest③实践测试用例-回归用例web自动化

# -*- coding: utf-8 -*- """ lemut_select - 业务受理 Author: duxiaowei Date: 2024/7/17 """ import timeimport allure import pytest from selenium.webdriver.common.by import By# 业务受理 allure.feature("业务受理") class …