基于Qwen2-VL模型针对LaTeX OCR任务进行微调训练 - 原模型 单图推理

news2025/1/15 18:47:25

基于Qwen2-VL模型针对LaTeX OCR任务进行微调训练 - 原模型 单图推理

flyish

输入

在这里插入图片描述

输出

[‘这是一幅中国传统山水画,描绘了一座高耸的山峰,周围环绕着树木和植被。画面下方有一片开阔的田野,远处的山峦在薄雾中若隐若现。画面上方有几行书法题字,可能是画家的签名或诗文。画面右上角和左下角有几枚印章,可能是画家或收藏者的印章。整体色调以淡雅的青绿色为主,给人一种宁静、自然的感觉。’]

使用flash-attn 减少资源消耗

pip install flash-attn --no-build-isolation

代码如下

from PIL import Image
import requests
import torch
from torchvision import io
from typing import Dict
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from modelscope import snapshot_download

# 下载模型快照并指定保存目录
model_dir = snapshot_download("qwen/Qwen2-VL-7B-Instruct")

# 加载模型到可用设备(CPU或GPU),并使用自动精度(根据设备自动选择)
model = Qwen2VLForConditionalGeneration.from_pretrained(
    model_dir, torch_dtype="auto", device_map="auto",attn_implementation="flash_attention_2"
)

# 打印模型的所有属性及其值(不包括方法)
print("\nModel Attributes and Their Values:")
attributes = dir(model)
for attr in attributes:
    try:
        value = getattr(model, attr)
        if not callable(value):
            print(f"{attr}: {value}")
    except AttributeError:
        continue


# 打印实际使用的 torch_dtype 
print(f"Actual torch_dtype: {model.dtype}")


# 输出模型的结构
print("\nModel Configuration:")
print(model.config)
# 加载图像处理器
processor = AutoProcessor.from_pretrained(model_dir,low_cpu_mem_usage=False)

# 图像的URL
#url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"

# 从给定的URL加载图像
#image = Image.open(requests.get(url, stream=True).raw)

# 从本地获取文件
image_path = './QueHuaQiuSe1.png'
image = Image.open(image_path)


# 定义对话历史,包括用户输入的文本和图像 Describe this image.
conversation = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
            },
            {"type": "text", "text": "描述这张图像。"},
        ],
    }
]

# 使用处理器应用聊天模板,并添加生成提示
text_prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)

# 预处理输入数据,将文本和图像转换为模型可以接受的格式
inputs = processor(
    text=[text_prompt], images=[image], padding=True, return_tensors="pt"
)

# 将输入数据移动到CUDA设备上(如果可用的话)
inputs = inputs.to("cuda")

# 推理:生成输出文本
output_ids = model.generate(**inputs, max_new_tokens=128)  # 最大新生成token数量为128

# 提取生成的token ID,去掉输入的原始token ID
generated_ids = [
    output_ids[len(input_ids) :]
    for input_ids, output_ids in zip(inputs.input_ids, output_ids)
]

# 解码生成的token ID为人类可读的文本
output_text = processor.batch_decode(
    generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
)

# 打印生成的描述文本
print(output_text)

模型的属性

Downloading Model to directory: /home/sss/.cache/modelscope/hub/qwen/Qwen2-VL-7B-Instruct
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00,  1.45it/s]

Model Attributes and Their Values:
T_destination: ~T_destination
__annotations__: {'dump_patches': <class 'bool'>, '_version': <class 'int'>, 'training': <class 'bool'>, '_parameters': typing.Dict[str, typing.Optional[torch.nn.parameter.Parameter]], '_buffers': typing.Dict[str, typing.Optional[torch.Tensor]], '_non_persistent_buffers_set': typing.Set[str], '_backward_pre_hooks': typing.Dict[int, typing.Callable], '_backward_hooks': typing.Dict[int, typing.Callable], '_is_full_backward_hook': typing.Optional[bool], '_forward_hooks': typing.Dict[int, typing.Callable], '_forward_hooks_with_kwargs': typing.Dict[int, bool], '_forward_hooks_always_called': typing.Dict[int, bool], '_forward_pre_hooks': typing.Dict[int, typing.Callable], '_forward_pre_hooks_with_kwargs': typing.Dict[int, bool], '_state_dict_hooks': typing.Dict[int, typing.Callable], '_load_state_dict_pre_hooks': typing.Dict[int, typing.Callable], '_state_dict_pre_hooks': typing.Dict[int, typing.Callable], '_load_state_dict_post_hooks': typing.Dict[int, typing.Callable], '_modules': typing.Dict[str, typing.Optional[ForwardRef('Module')]], 'call_super_init': <class 'bool'>, '_compiled_call_impl': typing.Optional[typing.Callable], 'forward': typing.Callable[..., typing.Any], '__call__': typing.Callable[..., typing.Any]}
__dict__: {'training': False, '_parameters': {}, '_buffers': {}, '_non_persistent_buffers_set': set(), '_backward_pre_hooks': OrderedDict(), '_backward_hooks': OrderedDict(), '_is_full_backward_hook': None, '_forward_hooks': OrderedDict(), '_forward_hooks_with_kwargs': OrderedDict(), '_forward_hooks_always_called': OrderedDict(), '_forward_pre_hooks': OrderedDict(), '_forward_pre_hooks_with_kwargs': OrderedDict(), '_state_dict_hooks': OrderedDict(), '_state_dict_pre_hooks': OrderedDict(), '_load_state_dict_pre_hooks': OrderedDict(), '_load_state_dict_post_hooks': OrderedDict(), '_modules': {'visual': Qwen2VisionTransformerPretrainedModel(
  (patch_embed): PatchEmbed(
    (proj): Conv3d(3, 1280, kernel_size=(2, 14, 14), stride=(2, 14, 14), bias=False)
  )
  (rotary_pos_emb): VisionRotaryEmbedding()
  (blocks): ModuleList(
    (0-31): 32 x Qwen2VLVisionBlock(
      (norm1): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
      (norm2): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
      (attn): VisionFlashAttention2(
        (qkv): Linear(in_features=1280, out_features=3840, bias=True)
        (proj): Linear(in_features=1280, out_features=1280, bias=True)
      )
      (mlp): VisionMlp(
        (fc1): Linear(in_features=1280, out_features=5120, bias=True)
        (act): QuickGELUActivation()
        (fc2): Linear(in_features=5120, out_features=1280, bias=True)
      )
    )
  )
  (merger): PatchMerger(
    (ln_q): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
    (mlp): Sequential(
      (0): Linear(in_features=5120, out_features=5120, bias=True)
      (1): GELU(approximate='none')
      (2): Linear(in_features=5120, out_features=3584, bias=True)
    )
  )
), 'model': Qwen2VLModel(
  (embed_tokens): Embedding(152064, 3584)
  (layers): ModuleList(
    (0-27): 28 x Qwen2VLDecoderLayer(
      (self_attn): Qwen2VLFlashAttention2(
        (q_proj): Linear(in_features=3584, out_features=3584, bias=True)
        (k_proj): Linear(in_features=3584, out_features=512, bias=True)
        (v_proj): Linear(in_features=3584, out_features=512, bias=True)
        (o_proj): Linear(in_features=3584, out_features=3584, bias=False)
        (rotary_emb): Qwen2VLRotaryEmbedding()
      )
      (mlp): Qwen2MLP(
        (gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
        (up_proj): Linear(in_features=3584, out_features=18944, bias=False)
        (down_proj): Linear(in_features=18944, out_features=3584, bias=False)
        (act_fn): SiLU()
      )
      (input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
      (post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
    )
  )
  (norm): Qwen2RMSNorm((3584,), eps=1e-06)
  (rotary_emb): Qwen2VLRotaryEmbedding()
), 'lm_head': Linear(in_features=3584, out_features=152064, bias=False)}, 'config': Qwen2VLConfig {
  "_attn_implementation_autoset": true,
  "_name_or_path": "/home/sss/.cache/modelscope/hub/qwen/Qwen2-VL-7B-Instruct",
  "architectures": [
    "Qwen2VLForConditionalGeneration"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "image_token_id": 151655,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2_vl",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": {
    "mrope_section": [
      16,
      24,
      24
    ],
    "rope_type": "default",
    "type": "default"
  },
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.47.0",
  "use_cache": true,
  "use_sliding_window": false,
  "video_token_id": 151656,
  "vision_config": {
    "in_chans": 3,
    "model_type": "qwen2_vl",
    "spatial_patch_size": 14
  },
  "vision_end_token_id": 151653,
  "vision_start_token_id": 151652,
  "vision_token_id": 151654,
  "vocab_size": 152064
}
, 'name_or_path': '/home/sss/.cache/modelscope/hub/qwen/Qwen2-VL-7B-Instruct', 'warnings_issued': {}, 'generation_config': GenerationConfig {
  "attn_implementation": "flash_attention_2",
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "pad_token_id": 151643,
  "temperature": 0.01,
  "top_k": 1,
  "top_p": 0.001
}
, '_keep_in_fp32_modules': None, 'vocab_size': 152064, 'rope_deltas': None, '_is_hf_initialized': True, '_old_forward': <bound method Qwen2VLForConditionalGeneration.forward of Qwen2VLForConditionalGeneration(
  (visual): Qwen2VisionTransformerPretrainedModel(
    (patch_embed): PatchEmbed(
      (proj): Conv3d(3, 1280, kernel_size=(2, 14, 14), stride=(2, 14, 14), bias=False)
    )
    (rotary_pos_emb): VisionRotaryEmbedding()
    (blocks): ModuleList(
      (0-31): 32 x Qwen2VLVisionBlock(
        (norm1): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
        (norm2): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
        (attn): VisionFlashAttention2(
          (qkv): Linear(in_features=1280, out_features=3840, bias=True)
          (proj): Linear(in_features=1280, out_features=1280, bias=True)
        )
        (mlp): VisionMlp(
          (fc1): Linear(in_features=1280, out_features=5120, bias=True)
          (act): QuickGELUActivation()
          (fc2): Linear(in_features=5120, out_features=1280, bias=True)
        )
      )
    )
    (merger): PatchMerger(
      (ln_q): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
      (mlp): Sequential(
        (0): Linear(in_features=5120, out_features=5120, bias=True)
        (1): GELU(approximate='none')
        (2): Linear(in_features=5120, out_features=3584, bias=True)
      )
    )
  )
  (model): Qwen2VLModel(
    (embed_tokens): Embedding(152064, 3584)
    (layers): ModuleList(
      (0-27): 28 x Qwen2VLDecoderLayer(
        (self_attn): Qwen2VLFlashAttention2(
          (q_proj): Linear(in_features=3584, out_features=3584, bias=True)
          (k_proj): Linear(in_features=3584, out_features=512, bias=True)
          (v_proj): Linear(in_features=3584, out_features=512, bias=True)
          (o_proj): Linear(in_features=3584, out_features=3584, bias=False)
          (rotary_emb): Qwen2VLRotaryEmbedding()
        )
        (mlp): Qwen2MLP(
          (gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
          (up_proj): Linear(in_features=3584, out_features=18944, bias=False)
          (down_proj): Linear(in_features=18944, out_features=3584, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
        (post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
      )
    )
    (norm): Qwen2RMSNorm((3584,), eps=1e-06)
    (rotary_emb): Qwen2VLRotaryEmbedding()
  )
  (lm_head): Linear(in_features=3584, out_features=152064, bias=False)
)>, '_hf_hook': AlignDevicesHook(execution_device=0, offload=False, io_same_device=True, offload_buffers=False, place_submodules=False, skip_keys='past_key_values'), 'forward': functools.partial(<function add_hook_to_module.<locals>.new_forward at 0x727f2aec7e20>, Qwen2VLForConditionalGeneration(
  (visual): Qwen2VisionTransformerPretrainedModel(
    (patch_embed): PatchEmbed(
      (proj): Conv3d(3, 1280, kernel_size=(2, 14, 14), stride=(2, 14, 14), bias=False)
    )
    (rotary_pos_emb): VisionRotaryEmbedding()
    (blocks): ModuleList(
      (0-31): 32 x Qwen2VLVisionBlock(
        (norm1): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
        (norm2): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
        (attn): VisionFlashAttention2(
          (qkv): Linear(in_features=1280, out_features=3840, bias=True)
          (proj): Linear(in_features=1280, out_features=1280, bias=True)
        )
        (mlp): VisionMlp(
          (fc1): Linear(in_features=1280, out_features=5120, bias=True)
          (act): QuickGELUActivation()
          (fc2): Linear(in_features=5120, out_features=1280, bias=True)
        )
      )
    )
    (merger): PatchMerger(
      (ln_q): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
      (mlp): Sequential(
        (0): Linear(in_features=5120, out_features=5120, bias=True)
        (1): GELU(approximate='none')
        (2): Linear(in_features=5120, out_features=3584, bias=True)
      )
    )
  )
  (model): Qwen2VLModel(
    (embed_tokens): Embedding(152064, 3584)
    (layers): ModuleList(
      (0-27): 28 x Qwen2VLDecoderLayer(
        (self_attn): Qwen2VLFlashAttention2(
          (q_proj): Linear(in_features=3584, out_features=3584, bias=True)
          (k_proj): Linear(in_features=3584, out_features=512, bias=True)
          (v_proj): Linear(in_features=3584, out_features=512, bias=True)
          (o_proj): Linear(in_features=3584, out_features=3584, bias=False)
          (rotary_emb): Qwen2VLRotaryEmbedding()
        )
        (mlp): Qwen2MLP(
          (gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
          (up_proj): Linear(in_features=3584, out_features=18944, bias=False)
          (down_proj): Linear(in_features=18944, out_features=3584, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
        (post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
      )
    )
    (norm): Qwen2RMSNorm((3584,), eps=1e-06)
    (rotary_emb): Qwen2VLRotaryEmbedding()
  )
  (lm_head): Linear(in_features=3584, out_features=152064, bias=False)
)), 'to': <function Module.to at 0x727f28b1c7c0>, 'cuda': <function Module.cuda at 0x727f28b1c860>, 'hf_device_map': {'visual': 0, 'model.embed_tokens': 0, 'model.layers.0': 0, 'model.layers.1': 0, 'model.layers.2': 0, 'model.layers.3': 0, 'model.layers.4': 0, 'model.layers.5': 0, 'model.layers.6': 0, 'model.layers.7': 0, 'model.layers.8': 0, 'model.layers.9': 0, 'model.layers.10': 0, 'model.layers.11': 1, 'model.layers.12': 1, 'model.layers.13': 1, 'model.layers.14': 1, 'model.layers.15': 1, 'model.layers.16': 1, 'model.layers.17': 1, 'model.layers.18': 1, 'model.layers.19': 1, 'model.layers.20': 1, 'model.layers.21': 1, 'model.layers.22': 1, 'model.layers.23': 1, 'model.layers.24': 1, 'model.layers.25': 1, 'model.layers.26': 1, 'model.layers.27': 1, 'model.norm': 1, 'model.rotary_emb': 1, 'lm_head': 1}}
__doc__: None
__module__: transformers.models.qwen2_vl.modeling_qwen2_vl
__weakref__: None
_auto_class: None
_backward_hooks: OrderedDict()
_backward_pre_hooks: OrderedDict()
_buffers: {}
_compiled_call_impl: None
_forward_hooks: OrderedDict()
_forward_hooks_always_called: OrderedDict()
_forward_hooks_with_kwargs: OrderedDict()
_forward_pre_hooks: OrderedDict()
_forward_pre_hooks_with_kwargs: OrderedDict()
_hf_hook: AlignDevicesHook(execution_device=0, offload=False, io_same_device=True, offload_buffers=False, place_submodules=False, skip_keys='past_key_values')
_hf_peft_config_loaded: False
_is_full_backward_hook: None
_is_hf_initialized: True
/home/sss/anaconda3/lib/python3.12/site-packages/transformers/modeling_utils.py:5055: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
  warnings.warn(
_is_quantized_training_enabled: False
_is_stateful: False
_keep_in_fp32_modules: None
_keep_in_fp32_modules: None
_keys_to_ignore_on_load_missing: None
_keys_to_ignore_on_load_unexpected: None
_keys_to_ignore_on_save: None
_load_state_dict_post_hooks: OrderedDict()
_load_state_dict_pre_hooks: OrderedDict()
_modules: {'visual': Qwen2VisionTransformerPretrainedModel(
  (patch_embed): PatchEmbed(
    (proj): Conv3d(3, 1280, kernel_size=(2, 14, 14), stride=(2, 14, 14), bias=False)
  )
  (rotary_pos_emb): VisionRotaryEmbedding()
  (blocks): ModuleList(
    (0-31): 32 x Qwen2VLVisionBlock(
      (norm1): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
      (norm2): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
      (attn): VisionFlashAttention2(
        (qkv): Linear(in_features=1280, out_features=3840, bias=True)
        (proj): Linear(in_features=1280, out_features=1280, bias=True)
      )
      (mlp): VisionMlp(
        (fc1): Linear(in_features=1280, out_features=5120, bias=True)
        (act): QuickGELUActivation()
        (fc2): Linear(in_features=5120, out_features=1280, bias=True)
      )
    )
  )
  (merger): PatchMerger(
    (ln_q): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
    (mlp): Sequential(
      (0): Linear(in_features=5120, out_features=5120, bias=True)
      (1): GELU(approximate='none')
      (2): Linear(in_features=5120, out_features=3584, bias=True)
    )
  )
), 'model': Qwen2VLModel(
  (embed_tokens): Embedding(152064, 3584)
  (layers): ModuleList(
    (0-27): 28 x Qwen2VLDecoderLayer(
      (self_attn): Qwen2VLFlashAttention2(
        (q_proj): Linear(in_features=3584, out_features=3584, bias=True)
        (k_proj): Linear(in_features=3584, out_features=512, bias=True)
        (v_proj): Linear(in_features=3584, out_features=512, bias=True)
        (o_proj): Linear(in_features=3584, out_features=3584, bias=False)
        (rotary_emb): Qwen2VLRotaryEmbedding()
      )
      (mlp): Qwen2MLP(
        (gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
        (up_proj): Linear(in_features=3584, out_features=18944, bias=False)
        (down_proj): Linear(in_features=18944, out_features=3584, bias=False)
        (act_fn): SiLU()
      )
      (input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
      (post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
    )
  )
  (norm): Qwen2RMSNorm((3584,), eps=1e-06)
  (rotary_emb): Qwen2VLRotaryEmbedding()
), 'lm_head': Linear(in_features=3584, out_features=152064, bias=False)}
_no_split_modules: ['Qwen2VLDecoderLayer', 'Qwen2VLVisionBlock']
_non_persistent_buffers_set: set()
_parameters: {}
_skip_keys_device_placement: past_key_values
_state_dict_hooks: OrderedDict()
_state_dict_pre_hooks: OrderedDict()
_supports_cache_class: True
_supports_flash_attn_2: True
_supports_flex_attn: False
_supports_quantized_cache: False
_supports_sdpa: True
_supports_static_cache: True
_tied_weights_keys: ['lm_head.weight']
_tp_plan: None
_version: 1
base_model_prefix: model
call_super_init: False
config: Qwen2VLConfig {
  "_attn_implementation_autoset": true,
  "_name_or_path": "/home/sss/.cache/modelscope/hub/qwen/Qwen2-VL-7B-Instruct",
  "architectures": [
    "Qwen2VLForConditionalGeneration"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "image_token_id": 151655,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2_vl",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": {
    "mrope_section": [
      16,
      24,
      24
    ],
    "rope_type": "default",
    "type": "default"
  },
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.47.0",
  "use_cache": true,
  "use_sliding_window": false,
  "video_token_id": 151656,
  "vision_config": {
    "in_chans": 3,
    "model_type": "qwen2_vl",
    "spatial_patch_size": 14
  },
  "vision_end_token_id": 151653,
  "vision_start_token_id": 151652,
  "vision_token_id": 151654,
  "vocab_size": 152064
}

device: cuda:0
dtype: torch.bfloat16
dummy_inputs: {'input_ids': tensor([[7, 6, 0, 0, 1],
        [1, 2, 3, 0, 0],
        [0, 0, 0, 4, 5]])}
dump_patches: False
framework: pt
generation_config: GenerationConfig {
  "attn_implementation": "flash_attention_2",
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "pad_token_id": 151643,
  "temperature": 0.01,
  "top_k": 1,
  "top_p": 0.001
}

hf_device_map: {'visual': 0, 'model.embed_tokens': 0, 'model.layers.0': 0, 'model.layers.1': 0, 'model.layers.2': 0, 'model.layers.3': 0, 'model.layers.4': 0, 'model.layers.5': 0, 'model.layers.6': 0, 'model.layers.7': 0, 'model.layers.8': 0, 'model.layers.9': 0, 'model.layers.10': 0, 'model.layers.11': 1, 'model.layers.12': 1, 'model.layers.13': 1, 'model.layers.14': 1, 'model.layers.15': 1, 'model.layers.16': 1, 'model.layers.17': 1, 'model.layers.18': 1, 'model.layers.19': 1, 'model.layers.20': 1, 'model.layers.21': 1, 'model.layers.22': 1, 'model.layers.23': 1, 'model.layers.24': 1, 'model.layers.25': 1, 'model.layers.26': 1, 'model.layers.27': 1, 'model.norm': 1, 'model.rotary_emb': 1, 'lm_head': 1}
is_gradient_checkpointing: False
is_parallelizable: False
`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.
main_input_name: input_ids
model_tags: None
name_or_path: /home/sss/.cache/modelscope/hub/qwen/Qwen2-VL-7B-Instruct
rope_deltas: None
supports_gradient_checkpointing: True
supports_tp_plan: False
training: False
vocab_size: 152064
warnings_issued: {}
Actual torch_dtype: torch.bfloat16

Model Configuration:
Qwen2VLConfig {
  "_attn_implementation_autoset": true,
  "_name_or_path": "/home/sss/.cache/modelscope/hub/qwen/Qwen2-VL-7B-Instruct",
  "architectures": [
    "Qwen2VLForConditionalGeneration"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "image_token_id": 151655,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2_vl",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": {
    "mrope_section": [
      16,
      24,
      24
    ],
    "rope_type": "default",
    "type": "default"
  },
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.47.0",
  "use_cache": true,
  "use_sliding_window": false,
  "video_token_id": 151656,
  "vision_config": {
    "in_chans": 3,
    "model_type": "qwen2_vl",
    "spatial_patch_size": 14
  },
  "vision_end_token_id": 151653,
  "vision_start_token_id": 151652,
  "vision_token_id": 151654,
  "vocab_size": 152064
}

输入

在这里插入图片描述

输出

[‘这张图片展示了一个宁静的湖泊景色。湖面平静,反射出天空和远处的山峰。湖的中央有一个小岛,上面有一些树木和灌木丛。岛的周围是清澈的水面,可以看到一些水生植物。远处可以看到一些建筑物,可能是住宅区或城市的一部分。天空晴朗,呈现出淡蓝色,给人一种宁静和平静的感觉。’]

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2258772.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

Docker 安装 Jenkins:2.346.3

准备&#xff1a;已安装Docker&#xff0c;已配置服务器安全组规则 1581 1、拉取镜像 [rootTseng ~]# docker pull jenkins/jenkins:2.346.3 2.346.3: Pulling from jenkins/jenkins 001c52e26ad5: Pull complete 6b8dd635df38: Pull complete 2ba4c74fd680: Pull complet…

AWS re:Invent 发布新的数据库产品 Aurora DSQL; NineData SQL编程大赛开始; 腾讯云支持PostgreSQL 17

重要更新 1. AWS re:Invent 发布新的数据库产品 Aurora DSQL &#xff0c;提供了跨区域、强一致、多区域读写的能力&#xff0c;同时具备99.999%&#xff08;多区域部署&#xff09;的可用性&#xff0c;兼容PostgreSQL&#xff1b;同时发布的还有 DynamoDB 也提供类似的跨区域…

《孤岛惊魂4》无法启动提示缺少“msvcp100.dll”快速修复方法!

《孤岛惊魂4》缺少msvcp100.dll的解决之道 在探索《孤岛惊魂4》这款充满惊险与刺激的射击游戏时&#xff0c;玩家可能会遇到一些意外的障碍&#xff0c;其中之一便是“缺少msvcp100.dll”的错误提示。这个错误不仅让游戏无法正常启动&#xff0c;还可能让玩家对游戏的热情大打…

机器学习:监督学习、无监督学习

1. 引言 机器学习是一种人工智能领域的技术&#xff0c;它旨在让计算机通过学习数据和模式&#xff0c;而不是明确地进行编程来完成任务。 机器学习分为监督学习、无监督学习、半监督学习、强化学习 四种。 ​ 2. 监督学习 2.1 什么是监督学习 定义&#xff1a;根据已有的数…

反向代理-缓存篇

文章目录 强缓存一、Expires(http1.0 规范)二、cache-control(http1.1 出现的 header 信息)Cache-Control 的常用选项Cache-Control 常用选项的选择三、弊端协商缓存一、ETag二、If-None-Match三、Last-modified四、If-Modified-Since浏览器的三种刷新方式静态资源部署策略…

【leetcode100】反转链表

1、题目描述 给你单链表的头节点 head &#xff0c;请你反转链表&#xff0c;并返回反转后的链表。 示例 1&#xff1a; 输入&#xff1a;head [1,2,3,4,5] 输出&#xff1a;[5,4,3,2,1] 2、初始思路 2.1 思路 # Definition for singly-linked list. # class ListNode: # …

1.网络知识-IP与子网掩码的关系及计算实例

IP与子网掩码 说实话&#xff0c;之前没有注意过&#xff0c;今天我打开自己的办公地电脑&#xff0c;看到我的网络配置如下&#xff1a; 我看到我的子网掩码是255.255.254.0&#xff0c;我就奇怪了&#xff0c;我经常见到的子网掩码都是255.255.255.0啊&#xff1f;难道公司配…

代发考试战报:12月8号通过HCIP-datacom数通两门考试

代发考试战报&#xff1a;12月8号通过HCIP-datacom数通两门考试&#xff0c;题库非常稳定&#xff0c;精修版题库&#xff0c;题库数量少&#xff0c;没有重复题&#xff0c;题库答案也很准确&#xff0c;排版也很清楚&#xff0c;看会就能考过&#xff0c;。#华为#HCIP#题库#考…

Autosar培训笔记整理<二>

目录 往期推荐 Autosar培训笔记整理&#xff1c;一&#xff1e; AUTOSAR 产品 AUTOSAR Classic Platform (CP): AUTOSAR Foundation: AUTOSAR Acceptance Tests (TC) AUTOSAR Methodology and Templates AUTOSAR Tools CP VS AP Autosar软件架构 Top view AUTOSAR基础…

Maven插件打包发布远程Docker镜像

dockerfile-maven-plugin插件的介绍 dockerfile-maven-plugin目前这款插件非常成熟&#xff0c;它集成了Maven和Docker&#xff0c;该插件的官方文档地址如下&#xff1a; 地址&#xff1a;https://github.com/spotify/dockerfile-maven 其他说明&#xff1a; dockerfile是用…

Maven(生命周期、POM、模块化、聚合、依赖管理)详解

Maven构建项目的生命周期 在Maven出现之前&#xff0c;项目构建的生命周期就已经存在&#xff0c;软件开发人员每天都在对项目进行清理&#xff0c;编译&#xff0c;测试&#xff0c;部署等工作&#xff0c;这个过程就是项目构建的生命周期。虽然大家都在不停的做构建工作&…

MATLAB四种逻辑运算

MATLAB中的四种逻辑运算包括逻辑与用&或 a n d 表示 ( 全为 1 时才为 1 &#xff0c;否则为 0 ) and表示(全为1时才为1&#xff0c;否则为0) and表示(全为1时才为1&#xff0c;否则为0)&#xff0c;逻辑或用|或 o r 表示 ( 有 1 就为 1 &#xff0c;都为 0 才为 0 ) or表示…

【知识点】图与图论入门

何为图论 见名知意&#xff0c;图论 (Graph Theory) 就是研究 图 (Graph) 的数学理论和方法。图是一种抽象的数据结构&#xff0c;由 节点 (Node) 和 连接这些节点的 边 (Edge) 组成。图论在计算机科学、网络分析、物流、社会网络分析等领域有广泛的应用。 如下&#xff0c;这…

Ariba Procurement: Administration_Cloud Basics

# SAP Ariba Procurement: Administration_Cloud Basics 认识Ariba Cloud SAP Ariba Procurement 是一个云计算平台… The Ariba Cloud 平台需要简单理解的概念: Datacenter数据中心:SAP Ariba在世界各地有许多数据中心。这些数据中心构成了Ariba云的基本物理基础设施。 …

【TypeScript】Vue: Property finally does not exist on type Promise<void>.

【TypeScript】Vue: Property finally does not exist on type Promise&#xff1c;void&#xff1e;. 问题描述 Vue: Property finally does not exist on type Promise<void>. Do you need to change your target library? Try changing the lib compiler option to…

Navicat for MySQL 查主键、表字段类型、索引

针对Navicat 版本11 &#xff0c;不同版本查询方式可能不同 1、主键查询 &#xff08;重点找DDL&#xff01;&#xff01;&#xff01;&#xff09; 方法&#xff08;1&#xff09; &#xff1a;右键 - 对象信息 - 选择要查的表 - DDL - PRIMARY KEY 方法&#xff08;2&…

【SpringBug】lombok插件中@Data不能生成get和set方法

一&#xff1a;问题引入 可以看到我们在类UserInfo上写了Data注解&#xff0c;但是在测试文件中生成的反编译target文件Us二Info中没有get和set方法 二&#xff1a;解决方法 1&#xff1a;Spring升级问题&#xff08;解决了我的问题&#xff09; 原因是Spring官方进行了升级…

【Linux系统编程】:system V共享内存

前言&#xff08;System V简介&#xff09; System V是一种强大的进程管理系统&#xff0c;在UNIX和类UNIX操作系统中广泛应用。 它主要包含进程控制块&#xff08;PCB&#xff09;、进程表、信号集、文件描述符表等部分。其中&#xff0c;进程控制块是System V中的核心数据结…

[免费]SpringBoot+Vue疫苗接种预约管理系统【论文+源码+SQL脚本】

大家好&#xff0c;我是java1234_小锋老师&#xff0c;看到一个不错的SpringBootVue疫苗接种预约管理系统&#xff0c;分享下哈。 项目介绍 如今的时代&#xff0c;是有史以来最好的时代&#xff0c;随着计算机的发展到现在的移动终端的发展&#xff0c;国内目前信息技术已经在…

ROS1切换到ROS2环境

ROS_DISTRO was set to humble before. Please make sure that the environment does not mix paths from different distributions. 这个提示表明在运行 source ~/ros/noetic/setup.bash 之前&#xff0c;环境变量 ROS_DISTRO 已经被设置为 humble。这意味着你可能已经在一个不…