Qwen2模型Text2SQL微调以及GGUF量化

Qwen2-1.5B微调

准备python环境

conda create --name llama_factory python=3.11
conda activate llama_factory

部署llama-factory

git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip3 install -e ".[torch,metrics]"
# 如果要在 Windows 平台上开启量化 LoRA（QLoRA），需要安装预编译的 bitsandbytes 库
pip3 install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.2.post2-py3-none-win_amd64.whl
# 安装pytorch
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
# 如启动报错，出现ImportError: cannot import name 'get_full_repo_name' from 'huggingface_hub'，需安装chardet
pip3 install chardet

执行python src/webui.py启动：
在这里插入图片描述

Text2SQL微调

（1）准备样本数据集，如：

{
    "db_id": "department_management",
    "instruction": "I want you to act as a SQL terminal in front of an example database, you need only to return the sql command to me.Below is an instruction that describes a task, Write a response that appropriately completes the request.\n\"\n##Instruction:\ndepartment_management contains tables such as department, head, management. Table department has columns such as Department_ID, Name, Creation, Ranking, Budget_in_Billions, Num_Employees. Department_ID is the primary key.\nTable head has columns such as head_ID, name, born_state, age. head_ID is the primary key.\nTable management has columns such as department_ID, head_ID, temporary_acting. department_ID is the primary key.\nThe head_ID of management is the foreign key of head_ID of head.\nThe department_ID of management is the foreign key of Department_ID of department.\n\n",
    "input": "###Input:\nHow many heads of the departments are older than 56 ?\n\n###Response:",
    "output": "SELECT count(*) FROM head WHERE age  >  56",
    "history": []
}

（2）添加数据集。将数据集json文件复制到LLaMA-Factory/data目录下，在dataset_info.json中添加如下内容：

"text2sql_train": {
    "file_name": "text2sql_train.json",
    "columns": {
      "prompt": "instruction",
      "query": "input",
      "response": "output",
      "history": "history"
    }
},
"text2sql_dev": {
    "file_name": "text2sql_dev.json",
    "columns": {
      "prompt": "instruction",
      "query": "input",
      "response": "output",
      "history": "history"
    }
}

（3）配置llama-factory页面上的各参数，预览命令如下（根据自己的实际情况调整参数）：

llamafactory-cli train `
    --stage sft `
    --do_train True `
    --model_name_or_path D:\\LLM\\Qwen2-1.5B-Instruct `
    --preprocessing_num_workers 16 `
    --finetuning_type lora `
    --quantization_method bitsandbytes `
    --template qwen `
    --flash_attn auto `
    --dataset_dir D:\\python_project\\LLaMA-Factory\\data `
    --dataset text2sql_train `
    --cutoff_len 1024 `
    --learning_rate 0.0001 `
    --num_train_epochs 2.0 `
    --max_samples 100000 `
    --per_device_train_batch_size 2 `
    --gradient_accumulation_steps 4 `
    --lr_scheduler_type cosine `
    --max_grad_norm 1.0 `
    --logging_steps 20 `
    --save_steps 500 `
    --warmup_steps 0 `
    --optim adamw_torch `
    --packing False `
    --report_to none `
    --output_dir saves\Qwen2-1.5B-Chat\lora\train_2024-07-19-19-45-59 `
    --bf16 True `
    --plot_loss True `
    --ddp_timeout 180000000 `
    --include_num_input_tokens_seen True `
    --lora_rank 8 `
    --lora_alpha 16 `
    --lora_dropout 0 `
    --lora_target all

（4）微调后，对测试集数据进行推理评估：

在这里插入图片描述

{
    "predict_bleu-4": 88.43791015473889,
    "predict_rouge-1": 92.31425483558995,
    "predict_rouge-2": 85.43010570599614,
    "predict_rouge-l": 89.06327794970986,
    "predict_runtime": 1027.4111,
    "predict_samples_per_second": 1.006,
    "predict_steps_per_second": 0.503
}

从以上的指标数据看，模型的效果还是挺不错的，大家可以基于自己的样本数据自行设置各个参数。以下是以上的评估指标解读：

1. predict_bleu-4：
    * BLEU（Bilingual Evaluation Understudy）是一种常用的用于评估机器翻译质量的指标。
    * BLEU-4 表示四元语法 BLEU 分数，它衡量模型生成文本与参考文本之间的 n-gram 匹配程度，其中 n=4。
    * 值越高表示生成的文本与参考文本越相似，最大值为 100。

2. predict_rouge-1 和 predict_rouge-2：
    * ROUGE（Recall-Oriented Understudy for Gisting Evaluation）是一种用于评估自动摘要和文本生成模型性能的指标。
    * ROUGE-1 表示一元 ROUGE 分数，ROUGE-2 表示二元 ROUGE 分数，分别衡量模型生成文本与参考文本之间的单个词和双词序列的匹配程度。
    * 值越高表示生成的文本与参考文本越相似，最大值为 100。

3. predict_rouge-l：
    * ROUGE-L 衡量模型生成文本与参考文本之间最长公共子序列（Longest Common Subsequence）的匹配程度。
    * 值越高表示生成的文本与参考文本越相似，最大值为 100。

4. predict_runtime：
    * 预测运行时间，表示模型生成一批样本所花费的总时间。
    * 单位通常为秒。

5. predict_samples_per_second：
    * 每秒生成的样本数量，表示模型每秒钟能够生成的样本数量。
    * 通常用于评估模型的推理速度。

6. predict_steps_per_second：
    * 每秒执行的步骤数量，表示模型每秒钟能够执行的步骤数量。
    * 对于生成模型，一般指的是每秒钟执行生成操作的次数。

（5）评估感觉达到了微调预期，可以直接导出微调后的模型（注意选上微调生成的检查点）：

在这里插入图片描述

GGUF模型

以上微调导出的是safetensors模型，我们可以使用llama.cpp将safetensors模型转为GGUF模型。推荐使用w64devkit+make方案部署llama.cpp。

安装make

下载地址：https://gnuwin32.sourceforge.net/packages/make.htm

在这里插入图片描述

点击框中的下载，下载安装后，把安装路径添加到环境变量PATH中。在终端，执行以下命令，将出现Make版本信息：

(llama_factory) D:\LLM\llama.cpp>make -v
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for i386-pc-mingw32

安装MinGW

下载源文件，把mingw64/bin添加到环境变量PATH中。

在这里插入图片描述

安装w64devkit

下载地址：https://github.com/skeeto/w64devkit/releases

下载 w64devkit-fortran-1.23.0.zip后解压。注意：不要包含中文路径。

部署llama.cpp

下载地址：https://github.com/ggerganov/llama.cpp

使用我们刚刚下载的w64devkit.exe打开llama.cpp，然后make，成功后就能得到一堆exe文件啦。

~ $ cd D:
D:/ $ cd /LLM/llama.cpp
D:/LLM/llama.cpp $ make

make成功后，安装python依赖：

conda create --name llama_cpp python=3.11
conda activate llama_cpp
pip3 install -r requirements.txt

转换为GGUF FP16格式

# model_path/mymodel为我们实际的模型路径
python convert_hf_to_gguf.py model_path/mymodel

运行日志：

(llama_cpp) D:\LLM\llama.cpp>python convert_hf_to_gguf.py D:/python_project/LLaMA-Factory/output_model/model2
INFO:hf-to-gguf:Loading model: model2
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00002.safetensors'
INFO:hf-to-gguf:token_embd.weight,         torch.bfloat16 --> F16, shape = {1536, 151936}
INFO:hf-to-gguf:blk.0.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.0.ffn_down.weight,     torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.0.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.0.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.0.attn_k.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.0.attn_k.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.0.attn_output.weight,  torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.0.attn_q.bias,         torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.0.attn_q.weight,       torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.0.attn_v.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.0.attn_v.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.1.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.1.ffn_down.weight,     torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.1.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.1.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.1.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.1.attn_k.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.1.attn_k.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.1.attn_output.weight,  torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.1.attn_q.bias,         torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.1.attn_q.weight,       torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.1.attn_v.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.1.attn_v.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.10.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.10.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.10.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.10.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.10.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.10.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.10.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.10.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.10.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.10.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.10.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.10.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.11.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.11.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.11.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.11.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.11.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.11.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.11.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.11.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.11.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.11.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.11.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.11.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.12.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.12.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.12.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.12.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.12.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.12.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.12.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.12.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.12.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.12.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.12.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.12.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.13.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.13.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.13.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.13.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.13.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.13.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.13.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.13.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.13.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.13.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.13.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.13.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.14.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.14.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.14.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.14.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.14.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.14.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.14.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.14.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.14.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.14.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.14.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.14.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.15.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.15.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.15.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.15.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.15.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.15.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.15.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.15.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.15.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.15.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.15.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.15.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.16.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.16.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.16.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.16.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.16.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.16.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.16.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.2.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.2.ffn_down.weight,     torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.2.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.2.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.2.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.2.attn_k.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.2.attn_k.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.2.attn_output.weight,  torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.2.attn_q.bias,         torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.2.attn_q.weight,       torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.2.attn_v.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.2.attn_v.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.3.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.3.ffn_down.weight,     torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.3.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.3.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.3.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.3.attn_k.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.3.attn_k.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.3.attn_output.weight,  torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.3.attn_q.bias,         torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.3.attn_q.weight,       torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.3.attn_v.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.3.attn_v.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.4.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.4.ffn_down.weight,     torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.4.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.4.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.4.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.4.attn_k.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.4.attn_k.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.4.attn_output.weight,  torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.4.attn_q.bias,         torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.4.attn_q.weight,       torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.4.attn_v.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.4.attn_v.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.5.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.5.ffn_down.weight,     torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.5.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.5.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.5.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.5.attn_k.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.5.attn_k.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.5.attn_output.weight,  torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.5.attn_q.bias,         torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.5.attn_q.weight,       torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.5.attn_v.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.5.attn_v.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.6.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.6.ffn_down.weight,     torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.6.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.6.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.6.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.6.attn_k.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.6.attn_k.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.6.attn_output.weight,  torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.6.attn_q.bias,         torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.6.attn_q.weight,       torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.6.attn_v.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.6.attn_v.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.7.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.7.ffn_down.weight,     torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.7.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.7.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.7.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.7.attn_k.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.7.attn_k.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.7.attn_output.weight,  torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.7.attn_q.bias,         torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.7.attn_q.weight,       torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.7.attn_v.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.7.attn_v.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.8.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.8.ffn_down.weight,     torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.8.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.8.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.8.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.8.attn_k.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.8.attn_k.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.8.attn_output.weight,  torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.8.attn_q.bias,         torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.8.attn_q.weight,       torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.8.attn_v.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.8.attn_v.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.9.attn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.9.ffn_down.weight,     torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.9.ffn_gate.weight,     torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.9.ffn_up.weight,       torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.9.attn_k.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.9.attn_k.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.9.attn_output.weight,  torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.9.attn_q.bias,         torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.9.attn_q.weight,       torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.9.attn_v.bias,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.9.attn_v.weight,       torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:gguf: loading model part 'model-00002-of-00002.safetensors'
INFO:hf-to-gguf:blk.16.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.16.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.16.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.16.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.16.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.17.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.17.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.17.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.17.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.17.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.17.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.17.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.17.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.17.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.17.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.17.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.17.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.18.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.18.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.18.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.18.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.18.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.18.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.18.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.18.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.18.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.18.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.18.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.18.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.19.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.19.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.19.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.19.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.19.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.19.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.19.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.19.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.19.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.19.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.19.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.19.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.20.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.20.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.20.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.20.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.20.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.20.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.20.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.20.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.20.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.20.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.20.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.20.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.21.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.21.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.21.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.21.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.21.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.21.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.21.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.21.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.21.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.21.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.21.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.21.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.22.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.22.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.22.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.22.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.22.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.22.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.22.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.22.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.22.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.22.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.22.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.22.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.23.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.23.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.23.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.23.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.23.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.23.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.23.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.23.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.23.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.23.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.23.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.23.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.24.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.24.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.24.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.24.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.24.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.24.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.24.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.24.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.24.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.24.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.24.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.24.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.25.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.25.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.25.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.25.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.25.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.25.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.25.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.25.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.25.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.25.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.25.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.25.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.26.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.26.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.26.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.26.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.26.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.26.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.26.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.26.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.26.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.26.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.26.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.26.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.27.attn_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.27.ffn_down.weight,    torch.bfloat16 --> F16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.27.ffn_gate.weight,    torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.27.ffn_up.weight,      torch.bfloat16 --> F16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.27.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.27.attn_k.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.27.attn_k.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:blk.27.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.27.attn_q.bias,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.27.attn_q.weight,      torch.bfloat16 --> F16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.27.attn_v.bias,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.27.attn_v.weight,      torch.bfloat16 --> F16, shape = {1536, 256}
INFO:hf-to-gguf:output_norm.weight,        torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 32768
INFO:hf-to-gguf:gguf: embedding length = 1536
INFO:hf-to-gguf:gguf: feed forward length = 8960
INFO:hf-to-gguf:gguf: head count = 12
INFO:hf-to-gguf:gguf: key-value head count = 2
INFO:hf-to-gguf:gguf: rope theta = 1000000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO:gguf.vocab:Adding 151387 merge(s).
INFO:gguf.vocab:Setting special token type eos to 151645
INFO:gguf.vocab:Setting special token type pad to 151643
INFO:gguf.vocab:Setting special token type bos to 151643
INFO:gguf.vocab:Setting chat_template to {% set system_message = 'You are a helpful assistant.' %}{% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] %}{% endif %}{% if system_message is defined %}{{ '<|im_start|>system
' + system_message + '<|im_end|>
' }}{% endif %}{% for message in messages %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ '<|im_start|>user
' + content + '<|im_end|>
<|im_start|>assistant
' }}{% elif message['role'] == 'assistant' %}{{ content + '<|im_end|>' + '
' }}{% endif %}{% endfor %}
INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:D:\Llm\Qwen2-1.5B-Instruct-F16.gguf: n_tensors = 338, total_size = 3.1G
Writing: 100%|██████████████████████████████████████████████████████████████████| 3.09G/3.09G [00:14<00:00, 218Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to D:\Llm\Qwen2-1.5B-Instruct-F16.gguf

模型量化

# model_path/mymodel-F16.gguf为刚刚得到的FP16模型，model_path/mymodel_Q4_k_M.gguf为要得到的量化模型，Q4_K_M为使用Q4_K_M方法
llama-quantize.exe model_path/mymodel-F16.gguf model_path/mymodel_Q4_k_M.gguf Q4_K_M

运行日志：

(llama_cpp) D:\LLM\llama.cpp>llama-quantize.exe D:/Llm/Qwen2-1.5B-Instruct-F16.gguf D:/Llm/Qwen2-1.5B-Instruct_Q4_k_M.gguf Q4_K_M
main: build = 0 (unknown)
main: built with cc (GCC) 14.1.0 for x86_64-w64-mingw32
main: quantizing 'D:/Llm/Qwen2-1.5B-Instruct-F16.gguf' to 'D:/Llm/Qwen2-1.5B-Instruct_Q4_k_M.gguf' as Q4_K_M
llama_model_loader: loaded meta data with 25 key-value pairs and 338 tensors from D:/Llm/Qwen2-1.5B-Instruct-F16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = D:\\LLM\\Qwen2 1.5B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = D:\\LLM\\Qwen2
llama_model_loader: - kv   5:                         general.size_label str              = 1.5B
llama_model_loader: - kv   6:                          qwen2.block_count u32              = 28
llama_model_loader: - kv   7:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv   8:                     qwen2.embedding_length u32              = 1536
llama_model_loader: - kv   9:                  qwen2.feed_forward_length u32              = 8960
llama_model_loader: - kv  10:                 qwen2.attention.head_count u32              = 12
llama_model_loader: - kv  11:              qwen2.attention.head_count_kv u32              = 2
llama_model_loader: - kv  12:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  13:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  14:                          general.file_type u32              = 1
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  16:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  17:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:                      tokenizer.ggml.merges arr[str,151387]  = ["臓 臓", "臓臓 臓臓", "i n", "臓 t",...
llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  22:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  23:                    tokenizer.chat_template str              = {% set system_message = 'You are a he...
llama_model_loader: - kv  24:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  141 tensors
llama_model_loader: - type  f16:  197 tensors
[   1/ 338]                    token_embd.weight - [ 1536, 151936,     1,     1], type =    f16, converting to q6_K .. size =   445.12 MiB ->   182.57 MiB
[   2/ 338]               blk.0.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[   3/ 338]                blk.0.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[   4/ 338]                blk.0.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[   5/ 338]                  blk.0.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[   6/ 338]                blk.0.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[   7/ 338]                    blk.0.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[   8/ 338]                  blk.0.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[   9/ 338]             blk.0.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  10/ 338]                    blk.0.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  11/ 338]                  blk.0.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  12/ 338]                    blk.0.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  13/ 338]                  blk.0.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[  14/ 338]               blk.1.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  15/ 338]                blk.1.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[  16/ 338]                blk.1.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  17/ 338]                  blk.1.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  18/ 338]                blk.1.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  19/ 338]                    blk.1.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  20/ 338]                  blk.1.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  21/ 338]             blk.1.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  22/ 338]                    blk.1.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  23/ 338]                  blk.1.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  24/ 338]                    blk.1.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  25/ 338]                  blk.1.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[  26/ 338]              blk.10.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  27/ 338]               blk.10.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[  28/ 338]               blk.10.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  29/ 338]                 blk.10.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  30/ 338]               blk.10.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  31/ 338]                   blk.10.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  32/ 338]                 blk.10.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  33/ 338]            blk.10.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  34/ 338]                   blk.10.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  35/ 338]                 blk.10.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  36/ 338]                   blk.10.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  37/ 338]                 blk.10.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[  38/ 338]              blk.11.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  39/ 338]               blk.11.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  40/ 338]               blk.11.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  41/ 338]                 blk.11.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  42/ 338]               blk.11.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  43/ 338]                   blk.11.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  44/ 338]                 blk.11.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  45/ 338]            blk.11.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  46/ 338]                   blk.11.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  47/ 338]                 blk.11.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  48/ 338]                   blk.11.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  49/ 338]                 blk.11.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  50/ 338]              blk.12.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  51/ 338]               blk.12.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  52/ 338]               blk.12.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  53/ 338]                 blk.12.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  54/ 338]               blk.12.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  55/ 338]                   blk.12.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  56/ 338]                 blk.12.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  57/ 338]            blk.12.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  58/ 338]                   blk.12.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  59/ 338]                 blk.12.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  60/ 338]                   blk.12.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  61/ 338]                 blk.12.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  62/ 338]              blk.13.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  63/ 338]               blk.13.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[  64/ 338]               blk.13.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  65/ 338]                 blk.13.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  66/ 338]               blk.13.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  67/ 338]                   blk.13.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  68/ 338]                 blk.13.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  69/ 338]            blk.13.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  70/ 338]                   blk.13.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  71/ 338]                 blk.13.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  72/ 338]                   blk.13.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  73/ 338]                 blk.13.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[  74/ 338]              blk.14.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  75/ 338]               blk.14.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  76/ 338]               blk.14.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  77/ 338]                 blk.14.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  78/ 338]               blk.14.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  79/ 338]                   blk.14.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  80/ 338]                 blk.14.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  81/ 338]            blk.14.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  82/ 338]                   blk.14.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  83/ 338]                 blk.14.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  84/ 338]                   blk.14.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  85/ 338]                 blk.14.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  86/ 338]              blk.15.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  87/ 338]               blk.15.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  88/ 338]               blk.15.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  89/ 338]                 blk.15.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[  90/ 338]               blk.15.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  91/ 338]                   blk.15.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  92/ 338]                 blk.15.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  93/ 338]            blk.15.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  94/ 338]                   blk.15.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  95/ 338]                 blk.15.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[  96/ 338]                   blk.15.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  97/ 338]                 blk.15.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[  98/ 338]                   blk.16.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[  99/ 338]                 blk.16.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 100/ 338]            blk.16.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 101/ 338]                   blk.16.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 102/ 338]                 blk.16.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 103/ 338]                   blk.16.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 104/ 338]                 blk.16.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[ 105/ 338]               blk.2.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 106/ 338]                blk.2.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[ 107/ 338]                blk.2.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 108/ 338]                  blk.2.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 109/ 338]                blk.2.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 110/ 338]                    blk.2.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 111/ 338]                  blk.2.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 112/ 338]             blk.2.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 113/ 338]                    blk.2.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 114/ 338]                  blk.2.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 115/ 338]                    blk.2.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 116/ 338]                  blk.2.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 117/ 338]               blk.3.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 118/ 338]                blk.3.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 119/ 338]                blk.3.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 120/ 338]                  blk.3.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 121/ 338]                blk.3.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 122/ 338]                    blk.3.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 123/ 338]                  blk.3.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 124/ 338]             blk.3.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 125/ 338]                    blk.3.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 126/ 338]                  blk.3.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 127/ 338]                    blk.3.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 128/ 338]                  blk.3.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 129/ 338]               blk.4.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 130/ 338]                blk.4.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 131/ 338]                blk.4.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 132/ 338]                  blk.4.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 133/ 338]                blk.4.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 134/ 338]                    blk.4.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 135/ 338]                  blk.4.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 136/ 338]             blk.4.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 137/ 338]                    blk.4.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 138/ 338]                  blk.4.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 139/ 338]                    blk.4.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 140/ 338]                  blk.4.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[ 141/ 338]               blk.5.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 142/ 338]                blk.5.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[ 143/ 338]                blk.5.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 144/ 338]                  blk.5.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 145/ 338]                blk.5.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 146/ 338]                    blk.5.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 147/ 338]                  blk.5.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 148/ 338]             blk.5.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 149/ 338]                    blk.5.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 150/ 338]                  blk.5.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 151/ 338]                    blk.5.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 152/ 338]                  blk.5.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 153/ 338]               blk.6.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 154/ 338]                blk.6.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 155/ 338]                blk.6.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 156/ 338]                  blk.6.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 157/ 338]                blk.6.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 158/ 338]                    blk.6.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 159/ 338]                  blk.6.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 160/ 338]             blk.6.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 161/ 338]                    blk.6.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 162/ 338]                  blk.6.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 163/ 338]                    blk.6.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 164/ 338]                  blk.6.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 165/ 338]               blk.7.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 166/ 338]                blk.7.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 167/ 338]                blk.7.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 168/ 338]                  blk.7.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 169/ 338]                blk.7.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 170/ 338]                    blk.7.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 171/ 338]                  blk.7.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 172/ 338]             blk.7.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 173/ 338]                    blk.7.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 174/ 338]                  blk.7.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 175/ 338]                    blk.7.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 176/ 338]                  blk.7.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[ 177/ 338]               blk.8.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 178/ 338]                blk.8.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[ 179/ 338]                blk.8.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 180/ 338]                  blk.8.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 181/ 338]                blk.8.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 182/ 338]                    blk.8.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 183/ 338]                  blk.8.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 184/ 338]             blk.8.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 185/ 338]                    blk.8.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 186/ 338]                  blk.8.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 187/ 338]                    blk.8.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 188/ 338]                  blk.8.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 189/ 338]               blk.9.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 190/ 338]                blk.9.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 191/ 338]                blk.9.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 192/ 338]                  blk.9.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 193/ 338]                blk.9.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 194/ 338]                    blk.9.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 195/ 338]                  blk.9.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 196/ 338]             blk.9.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 197/ 338]                    blk.9.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 198/ 338]                  blk.9.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 199/ 338]                    blk.9.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 200/ 338]                  blk.9.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 201/ 338]              blk.16.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 202/ 338]               blk.16.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 203/ 338]               blk.16.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 204/ 338]                 blk.16.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 205/ 338]               blk.16.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 206/ 338]              blk.17.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 207/ 338]               blk.17.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[ 208/ 338]               blk.17.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 209/ 338]                 blk.17.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 210/ 338]               blk.17.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 211/ 338]                   blk.17.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 212/ 338]                 blk.17.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 213/ 338]            blk.17.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 214/ 338]                   blk.17.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 215/ 338]                 blk.17.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 216/ 338]                   blk.17.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 217/ 338]                 blk.17.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[ 218/ 338]              blk.18.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 219/ 338]               blk.18.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 220/ 338]               blk.18.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 221/ 338]                 blk.18.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 222/ 338]               blk.18.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 223/ 338]                   blk.18.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 224/ 338]                 blk.18.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 225/ 338]            blk.18.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 226/ 338]                   blk.18.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 227/ 338]                 blk.18.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 228/ 338]                   blk.18.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 229/ 338]                 blk.18.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 230/ 338]              blk.19.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 231/ 338]               blk.19.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 232/ 338]               blk.19.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 233/ 338]                 blk.19.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 234/ 338]               blk.19.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 235/ 338]                   blk.19.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 236/ 338]                 blk.19.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 237/ 338]            blk.19.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 238/ 338]                   blk.19.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 239/ 338]                 blk.19.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 240/ 338]                   blk.19.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 241/ 338]                 blk.19.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 242/ 338]              blk.20.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 243/ 338]               blk.20.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[ 244/ 338]               blk.20.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 245/ 338]                 blk.20.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 246/ 338]               blk.20.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 247/ 338]                   blk.20.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 248/ 338]                 blk.20.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 249/ 338]            blk.20.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 250/ 338]                   blk.20.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 251/ 338]                 blk.20.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 252/ 338]                   blk.20.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 253/ 338]                 blk.20.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[ 254/ 338]              blk.21.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 255/ 338]               blk.21.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 256/ 338]               blk.21.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 257/ 338]                 blk.21.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 258/ 338]               blk.21.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 259/ 338]                   blk.21.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 260/ 338]                 blk.21.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 261/ 338]            blk.21.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 262/ 338]                   blk.21.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 263/ 338]                 blk.21.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 264/ 338]                   blk.21.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 265/ 338]                 blk.21.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 266/ 338]              blk.22.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 267/ 338]               blk.22.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 268/ 338]               blk.22.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 269/ 338]                 blk.22.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 270/ 338]               blk.22.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 271/ 338]                   blk.22.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 272/ 338]                 blk.22.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 273/ 338]            blk.22.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 274/ 338]                   blk.22.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 275/ 338]                 blk.22.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 276/ 338]                   blk.22.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 277/ 338]                 blk.22.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 278/ 338]              blk.23.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 279/ 338]               blk.23.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[ 280/ 338]               blk.23.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 281/ 338]                 blk.23.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 282/ 338]               blk.23.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 283/ 338]                   blk.23.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 284/ 338]                 blk.23.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 285/ 338]            blk.23.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 286/ 338]                   blk.23.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 287/ 338]                 blk.23.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 288/ 338]                   blk.23.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 289/ 338]                 blk.23.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[ 290/ 338]              blk.24.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 291/ 338]               blk.24.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[ 292/ 338]               blk.24.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 293/ 338]                 blk.24.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 294/ 338]               blk.24.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 295/ 338]                   blk.24.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 296/ 338]                 blk.24.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 297/ 338]            blk.24.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 298/ 338]                   blk.24.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 299/ 338]                 blk.24.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 300/ 338]                   blk.24.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 301/ 338]                 blk.24.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[ 302/ 338]              blk.25.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 303/ 338]               blk.25.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[ 304/ 338]               blk.25.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 305/ 338]                 blk.25.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 306/ 338]               blk.25.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 307/ 338]                   blk.25.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 308/ 338]                 blk.25.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 309/ 338]            blk.25.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 310/ 338]                   blk.25.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 311/ 338]                 blk.25.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 312/ 338]                   blk.25.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 313/ 338]                 blk.25.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[ 314/ 338]              blk.26.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 315/ 338]               blk.26.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[ 316/ 338]               blk.26.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 317/ 338]                 blk.26.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 318/ 338]               blk.26.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 319/ 338]                   blk.26.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 320/ 338]                 blk.26.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 321/ 338]            blk.26.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 322/ 338]                   blk.26.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 323/ 338]                 blk.26.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 324/ 338]                   blk.26.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 325/ 338]                 blk.26.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[ 326/ 338]              blk.27.attn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 327/ 338]               blk.27.ffn_down.weight - [ 8960,  1536,     1,     1], type =    f16, converting to q6_K .. size =    26.25 MiB ->    10.77 MiB
[ 328/ 338]               blk.27.ffn_gate.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 329/ 338]                 blk.27.ffn_up.weight - [ 1536,  8960,     1,     1], type =    f16, converting to q4_K .. size =    26.25 MiB ->     7.38 MiB
[ 330/ 338]               blk.27.ffn_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 331/ 338]                   blk.27.attn_k.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 332/ 338]                 blk.27.attn_k.weight - [ 1536,   256,     1,     1], type =    f16, converting to q4_K .. size =     0.75 MiB ->     0.21 MiB
[ 333/ 338]            blk.27.attn_output.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 334/ 338]                   blk.27.attn_q.bias - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[ 335/ 338]                 blk.27.attn_q.weight - [ 1536,  1536,     1,     1], type =    f16, converting to q4_K .. size =     4.50 MiB ->     1.27 MiB
[ 336/ 338]                   blk.27.attn_v.bias - [  256,     1,     1,     1], type =    f32, size =    0.001 MB
[ 337/ 338]                 blk.27.attn_v.weight - [ 1536,   256,     1,     1], type =    f16, converting to q6_K .. size =     0.75 MiB ->     0.31 MiB
[ 338/ 338]                   output_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
llama_model_quantize_internal: model size  =  2944.68 MB
llama_model_quantize_internal: quant size  =   934.69 MB

main: quantize time = 14228.85 ms
main:    total time = 14228.85 ms