Mistral AI 又又又开源了闭源企业级模型——Mistral-Small-Instruct-2409

news2024/9/20 23:39:32

就在不久前,Mistral 公司在开源了 Pixtral 12B 视觉多模态大模型之后,又开源了自家的企业级小型模型 Mistral-Small-Instruct-2409 (22B),这是 Mistral AI 最新的企业级小型模型,是 Mistral Small v24.02 的升级版。该机型可根据 Mistral Research License 使用,为客户提供了灵活的选择,使其能够在翻译、摘要、情感分析和其他不需要完整通用模型的任务中,选择经济高效、快速可靠的解决方案。
在这里插入图片描述

Mistral Small 雏形采用 Mixtral-8X7B-v0.1(46.7B),这是一个具有 12B 活动参数的稀疏专家混合模型。它的推理能力更强,功能更多,可以生成和推理代码,并且是多语言的,支持英语、法语、德语、意大利语和西班牙语。

太激动人心了, Mistral 型号的性能总是出类拔萃。现在,我们在很多缝隙上都有了出色的覆盖范围

  • 8b- Llama 3.1 8b

  • 12b- Nemo 12b

  • 22b- Mistral Small

  • 27b- Gemma-2 27b

  • 35b- Command-R 35b 08-2024

  • 40-60b- GAP (我相信这里有两个新的 MOE,但我最后发现 Llamacpp 不支持它们)

  • 70b- Llama 3.1 70b

  • 103b- Command-R+ 103b

  • 123b- Mistral Large 2

  • 141b- WizardLM-2 8x22b

  • 230b- Deepseek V2/2.5

  • 405b- Llama 3.1 405b

Mistral Small v24.09 拥有 220 亿个参数,为客户提供了介于 Mistral NeMo 12B 和 Mistral Large 2 之间的便捷中间点,提供了可在各种平台和环境中部署的经济高效的解决方案。。

在这里插入图片描述
在这里插入图片描述

Mistral Small v24.09 拥有 220 亿个参数,为客户提供了介于 Mistral NeMo 12B 和 Mistral Large 2 之间的便捷中间点,提供了可在各种平台和环境中部署的经济高效的解决方案。如下图所示,与以前的模型相比,新的小型模型在人类对齐、推理能力和代码方面都有显著改进。
在这里插入图片描述
在这里插入图片描述

Mistral-Small-Instruct-2409 是一个指示微调版本,具有以下特点:

  • 22B 参数
  • 词汇量达 32768
  • 支持函数调用
  • 128k 序列长度

使用

vLLM(推荐)

安装 vLLM >= v0.6.1.post1

pip install --upgrade vllm

安装 mistral_common >= 1.4.1

pip install --upgrade mistral_common

本地

from vllm import LLM
from vllm.sampling_params import SamplingParams

model_name = "mistralai/Mistral-Small-Instruct-2409"

sampling_params = SamplingParams(max_tokens=8192)

# note that running Mistral-Small on a single GPU requires at least 44 GB of GPU RAM
# If you want to divide the GPU requirement over multiple devices, please add *e.g.* `tensor_parallel=2`
llm = LLM(model=model_name, tokenizer_mode="mistral", config_format="mistral", load_format="mistral")

prompt = "How often does the letter r occur in Mistral?"

messages = [
    {
        "role": "user",
        "content": prompt
    },
]

outputs = llm.chat(messages, sampling_params=sampling_params)

print(outputs[0].outputs[0].text)

服务器

vllm serve mistralai/Mistral-Small-Instruct-2409 --tokenizer_mode mistral --config_format mistral --load_format mistral

注意: 在单 GPU 上运行 Mistral-Small 至少需要 44 GB GPU 内存。

如果要将 GPU 需求分配给多个设备,请添加 --tensor_parallel=2 等信息

客户端

curl --location 'http://<your-node-url>:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer token' \
--data '{
    "model": "mistralai/Mistral-Small-Instruct-2409",
    "messages": [
      {
        "role": "user",
        "content": "How often does the letter r occur in Mistral?"
      }
    ]
}'

Mistral-inference

安装mistral_inference >= 1.4.1

pip install mistral_inference --upgrade

下载

from huggingface_hub import snapshot_download
from pathlib import Path

mistral_models_path = Path.home().joinpath('mistral_models', '22B-Instruct-Small')
mistral_models_path.mkdir(parents=True, exist_ok=True)

snapshot_download(repo_id="mistralai/Mistral-Small-Instruct-2409", allow_patterns=["params.json", "consolidated.safetensors", "tokenizer.model.v3"], local_dir=mistral_models_path)

聊天

mistral-chat $HOME/mistral_models/22B-Instruct-Small --instruct --max_tokens 256

Instruct following

from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest


tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
model = Transformer.from_folder(mistral_models_path)

completion_request = ChatCompletionRequest(messages=[UserMessage(content="How often does the letter r occur in Mistral?")])

tokens = tokenizer.encode_chat_completion(completion_request).tokens

out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])

print(result)

Function calling

from mistral_common.protocol.instruct.tool_calls import Function, Tool
from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest


tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
model = Transformer.from_folder(mistral_models_path)

completion_request = ChatCompletionRequest(
    tools=[
        Tool(
            function=Function(
                name="get_current_weather",
                description="Get the current weather",
                parameters={
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        },
                        "format": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"],
                            "description": "The temperature unit to use. Infer this from the users location.",
                        },
                    },
                    "required": ["location", "format"],
                },
            )
        )
    ],
    messages=[
        UserMessage(content="What's the weather like today in Paris?"),
        ],
)

tokens = tokenizer.encode_chat_completion(completion_request).tokens

out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])

print(result)

Hugging Face Transformers

from transformers import LlamaTokenizerFast, MistralForCausalLM
import torch

device = "cuda"
tokenizer = LlamaTokenizerFast.from_pretrained('mistralai/Mistral-Small-Instruct-2409')
tokenizer.pad_token = tokenizer.eos_token

model = MistralForCausalLM.from_pretrained('mistralai/Mistral-Small-Instruct-2409', torch_dtype=torch.bfloat16)
model = model.to(device)

prompt = "How often does the letter r occur in Mistral?"

messages = [
    {"role": "user", "content": prompt},
 ]

model_input = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(device)
gen = model.generate(model_input, max_new_tokens=150)
dec = tokenizer.batch_decode(gen)
print(dec)

输出

<s>
  [INST]
  How often does the letter r occur in Mistral?
  [/INST]
  To determine how often the letter "r" occurs in the word "Mistral,"
  we can simply count the instances of "r" in the word.
  The word "Mistral" is broken down as follows:
    - M
    - i
    - s
    - t
    - r
    - a
    - l
  Counting the "r"s, we find that there is only one "r" in "Mistral."
  Therefore, the letter "r" occurs once in the word "Mistral."
</s>

看来 Mistral 尝试用 CoT 来修复草莓问题🙂

资料

https://mistral.ai/news/september-24-release/

https://artificialanalysis.ai/models/mistral-small

https://huggingface.co/mistralai/Mistral-Small-Instruct-2409

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2145496.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

C++源代码封装成dll动态链接库,并在WPF项目中使用的步骤说明

文章目录 1. 创建并生成C的DLL&#xff08;C动态链接库&#xff09;&#xff08;1&#xff09;新建项目-->开发语言选定C&#xff0c;在搜索栏搜索“动态链接库”-->配置项目名称和路径-->添加类&#xff0c;此处命名为My_C_Class&#xff08;2)实现类的功能&#xff…

Elasticsearch 下载安装及使用总结

官网文档地址&#xff1a;Elasticsearch Guide [8.13] 官网下载地址&#xff1a;Download Elasticsearch 1. 下载安装 1、下载对应系统的版本 这里下载的 Elasticsearch 版本为 8.13.2&#xff0c;Elasticsearch 依赖 Java&#xff0c;因此要先在服务器上安装 JDK&#xff…

SOLIDWORKS® 2025 新增功能 - SIMULATION

SOLIDWORKS Simulation 1常规弹簧连接 • 通过定义仅轴向、各向同性或正交各向异性弹簧&#xff0c; 在曲面之间轻松创建自定义弹簧连接。 • 通过添加自定义合规性提高仿真性能和精度。 优点 利用新的弹簧连接功能&#xff0c; 实现更简单、更逼真的仿真 设置。 2增强了…

Unity 百度AI实现无绿幕拍照抠像功能(详解版)

目录 一、前言 1.抠像效果 2.去哪找百度ai抠图 3.基础流程跳过 二、获取AccessToken 1.什么是Token 2.为什么要获取Token 3.如何获取token 4.解析json 5.完整代码 三、抠像 1.准备地址 2.建立链接&#xff0c;和基本配置 3.图片格式转换 4.开始上传 5.获取回复…

SpringBoot 整合docker,执行容器服务

我使用以下文章的镜像作为演示镜像,读者有自己的镜像可以使用自己的 TencentARC/GFPGAN人脸恢复Ubuntu-22.04搭建(附带Docker镜像)_tencentarc gfpgan-CSDN博客 1. 封装springboot 启动docker容器的方法 public String runDockerCommand(String[] command) {StringBuilder res…

代码随想录Day 49|leetcode题目:42.接雨水、84.柱状图中最大矩形

提示&#xff1a;DDU&#xff0c;供自己复习使用。欢迎大家前来讨论~ 文章目录 题目题目一&#xff1a;42. 接雨水解题思路&#xff1a;暴力解法双指针优化思路&#xff1a;单调栈解法单调栈处理逻辑 题目二&#xff1a; 84.柱状图中最大的矩形解题思路&#xff1a;暴力解法双…

闯关leetcode——35. Search Insert Position

大纲 题目地址内容 解题代码地址 题目 地址 https://leetcode.com/problems/search-insert-position/description/ 内容 Given a sorted array of distinct integers and a target value, return the index if the target is found. If not, return the index where it wou…

【数据结构】排序算法---冒泡排序

文章目录 1. 定义2. 算法步骤3. 动图演示4. 性质5. 算法分析6. 代码实现C语言PythonJavaCGo 结语 1. 定义 冒泡排序&#xff08;英语&#xff1a;Bubble sort&#xff09;是一种简单的排序算法。它重复地走访过要排序的数列&#xff0c;一次比较两个元素&#xff0c;如果它们的…

svn回退到以前历史版本修改并上传

svn回退到以前版本&#xff0c;并在以前版本上修改代码后&#xff0c;上传到svn库当中&#xff0c;如下步骤&#xff1a; 3、 以回退到版本号4为例&#xff1a;选中版本号4&#xff0c;右键->Revert to this version,在出现的对话框中 点击yes&#xff01; 4、 5、

【ARM】Trustzone和安全架构

Trustzone的基本概念&背景和历史 什么是Trustzone&#xff1f; 什么是TEE&#xff1f; Trustzone是一个技术&#xff0c;是一个技术的设计&#xff0c;一个安全架构&#xff0c;既不是软件也不是硬件。 TEE (Trusted Execution Environment) 可信执行环境。就是依托Trust…

Java项目——苍穹外卖(二)

Redis 简介 Redis是一个基于内存的key-value结构数据库 基于内存存储&#xff0c;读写性能高适合存储热点数据&#xff08;热点商品、资讯、新闻&#xff09;企业应用广泛 基础操作 启动 在redis安装目录中打开cmd&#xff0c;输入如上图指令即可启动&#xff0c;按下crtl…

linux入门到实操-6 Linux服务管理、系统运行级别、配置服务开机启动和关闭防火墙、关机重启

教程来源&#xff1a;B站视频BV1WY4y1H7d3 3天搞定Linux&#xff0c;1天搞定Shell&#xff0c;清华学神带你通关_哔哩哔哩_bilibili 整理汇总的课程内容笔记和课程资料&#xff08;包含课程同版本linux系统文件等内容&#xff09;&#xff0c;供大家学习交流下载&#xff1a;…

html详细知识

1-标题标签、水平线、字体标签 <!--1.标题标签1&#xff09;格式&#xff1a;<hn></hn> n的范围是1-6&#xff0c;依次递减2&#xff09;标题标签特点&#xff1a;a:单独占一行b:自动加粗2.水平线1&#xff09;格式&#xff1a;<hr/>2)属性&#xff1a;…

soc及其相关概念

用户无法直接操作内存&#xff0c;只能让内存映射到用户空间然后操作 1. 内存映射&#xff08;Memory-Mapped Files&#xff09;内存映射文件是一种方法&#xff0c;它允许一个或多个进程将一个文件或者一个匿名区域映射到它们各自的虚拟地址空间中。当文件被映射到内存后&…

rsync 远程同步及实时同步部署

一、rsync 远程同步 1.1 rsync简介 Rsync&#xff08;Remote Sync&#xff0c;远程同步&#xff09;是一个开源的快速备份工具&#xff0c;适用于异地备份、镜像服务器等应用。它的主要功能特性包括&#xff1a; 数据镜像同步&#xff1a;在不同主机之间同步整个目录树。增量…

基于STM32的无人小车自主避障系统设计

文章目录 前言资料获取设计介绍功能介绍设计程序具体实现截图参考文献设计获取 前言 &#x1f497;博主介绍&#xff1a;✌全网粉丝10W,CSDN特邀作者、博客专家、CSDN新星计划导师&#xff0c;一名热衷于单片机技术探索与分享的博主、专注于 精通51/STM32/MSP430/AVR等单片机设…

设计模式 享元模式(Flyweight Pattern)

享元模式 简绍 享元模式&#xff08;Flyweight Pattern&#xff09;是一种结构型设计模式&#xff0c;它的目的是通过共享技术来有效地支持大量细粒度的对象。享元模式可以极大地减少内存的使用&#xff0c;从而提高程序的性能。它特别适用于需要创建大量相似对象的场景&#…

鸿蒙 ArkUI组件三

ArkUI组件&#xff08;续&#xff09; QRCode组件 用于显示单个二维码的组件。 说明 该组件从API Version 7开始支持。后续版本如有新增内容&#xff0c;则采用上角标单独标记该内容的起始版本。二维码组件的像素点数量与内容有关&#xff0c;当组件尺寸过小时&#xff0c;可能…

二叉树的层序遍历(含十道leetcode相关题目)

文章目录 二叉树层序遍历模板102. 二叉树的层序遍历 二叉树层序遍历模板 我们之前讲过了关于二叉树的深度优先遍历的文章&#xff1a;前中后序遍历的递归法和迭代法。 接下来我们再来介绍二叉树的另一种遍历方式&#xff1a;层序遍历。 层序遍历一个二叉树。就是从左到右一层…

vue2使用npm引入依赖(例如axios),报错Module parse failed: Unexpected token解决方案

报错情况 Module parse failed: Unexpected token (5:2) You may need an appropriate loader to handle this file type. 原因 因为我们npm install时默认都是下载最新版本&#xff0c;然后个别依赖的版本太新&#xff0c;vue2他受不起这个福分。 解决方法 先去package.js…