ollama大模型qwen2:7b性能测试

news2025/1/10 20:42:09

部署環境信息:

(base) root@alg-dev17:/opt# lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         45 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  8
  On-line CPU(s) list:   0-7
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
    CPU family:          6
    Model:               85
    Thread(s) per core:  1
    Core(s) per socket:  1
    Socket(s):           8
    Stepping:            4
    BogoMIPS:            4589.21
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable no
                         nstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid
                         _fault invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec
                          xgetbv1 xsaves arat pku ospke md_clear flush_l1d arch_capabilities
Virtualization features: 
  Hypervisor vendor:     VMware
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   256 KiB (8 instances)
  L1i:                   256 KiB (8 instances)
  L2:                    8 MiB (8 instances)
  L3:                    198 MiB (8 instances)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-7
Vulnerabilities:         
  Gather data sampling:  Unknown: Dependent on hypervisor status
  Itlb multihit:         KVM: Mitigation: VMX unsupported
  L1tf:                  Mitigation; PTE Inversion
  Mds:                   Mitigation; Clear CPU buffers; SMT Host state unknown
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Mitigation; Clear CPU buffers; SMT Host state unknown
  Retbleed:              Mitigation; IBRS
  Spec rstack overflow:  Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; IBRS; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Syscall hardening, KVM SW loop
  Srbds:                 Not affected
  Tsx async abort:       Not affected

(base) root@alg-dev17:/opt# free -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi       2.4Gi        12Gi        18Mi       1.2Gi        10Gi
Swap:          3.8Gi          0B       3.8Gi


(base) root@alg-dev17:/opt# nvidia-smi
Fri Jun 28 09:17:11 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02              Driver Version: 555.42.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti     On  |   00000000:0B:00.0 Off |                  N/A |
| 23%   28C    P8              8W /  250W |       3MiB /  11264MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

大模型:

(base) root@alg-dev17:/opt# ollama list
NAME            ID              SIZE    MODIFIED     
qwen2:7b        e0d4e1163c58    4.4 GB  22 hours ago

ollama服務配置:

(base) root@alg-dev17:/opt# cat /etc/systemd/system/ollama.service
[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda/bin"
Environment="OLLAMA_NUM_PARALLEL=16"
Environment="OLLAMA_MAX_LOADED_MODELS=4"
Environment="OLLAMA_HOST=0.0.0.0"


執行脚本:参照一个csdn用户的分享的脚本

import aiohttp
import asyncio
import time
from tqdm import tqdm

import random

questions = [
    "Why is the sky blue?", "Why do we dream?", "Why is the ocean salty?", "Why do leaves change color?",
    "Why do birds sing?", "Why do we have seasons?", "Why do stars twinkle?", "Why do we yawn?",
    "Why is the sun hot?", "Why do cats purr?", "Why do dogs bark?", "Why do fish swim?",
    "Why do we have fingerprints?", "Why do we sneeze?", "Why do we have eyebrows?", "Why do we have hair?",
    "Why do we have nails?", "Why do we have teeth?", "Why do we have bones?", "Why do we have muscles?",
    "Why do we have blood?", "Why do we have a heart?", "Why do we have lungs?", "Why do we have a brain?",
    "Why do we have skin?", "Why do we have ears?", "Why do we have eyes?", "Why do we have a nose?",
    "Why do we have a mouth?", "Why do we have a tongue?", "Why do we have a stomach?", "Why do we have intestines?",
    "Why do we have a liver?", "Why do we have kidneys?", "Why do we have a bladder?", "Why do we have a pancreas?",
    "Why do we have a spleen?", "Why do we have a gallbladder?", "Why do we have a thyroid?", "Why do we have adrenal glands?",
    "Why do we have a pituitary gland?", "Why do we have a hypothalamus?", "Why do we have a thymus?", "Why do we have lymph nodes?",
    "Why do we have a spinal cord?", "Why do we have nerves?", "Why do we have a circulatory system?", "Why do we have a respiratory system?",
    "Why do we have a digestive system?", "Why do we have an immune system?"
]

async def fetch(session, url):
    """
    参数:
        session (aiohttp.ClientSession): 用于请求的会话。
        url (str): 要发送请求的 URL。
    
    返回:
        tuple: 包含完成 token 数量和请求时间。
    """
    start_time = time.time()

    # 随机选择一个问题
    question = random.choice(questions) # <--- 这两个必须注释一个

    # 固定问题                                 
    # question = questions[0]             # <--- 这两个必须注释一个

    # 请求的内容
    json_payload = {
        "model": "qwen2:7b",
        "messages": [{"role": "user", "content": question}],
        "stream": False,
        "temperature": 0.7 # 参数使用 0.7 保证每次的结果略有区别
    }
    async with session.post(url, json=json_payload) as response:
        response_json = await response.json()
        print(f"{response_json}")
        end_time = time.time()
        request_time = end_time - start_time
        completion_tokens = response_json['usage']['completion_tokens'] # 从返回的参数里获取生成的 token 的数量
        return completion_tokens, request_time

async def bound_fetch(sem, session, url, pbar):
    # 使用信号量 sem 来限制并发请求的数量,确保不会超过最大并发请求数
    async with sem:
        result = await fetch(session, url)
        pbar.update(1)
        return result

async def run(load_url, max_concurrent_requests, total_requests):
    """
    通过发送多个并发请求来运行基准测试。
    
    参数:
        load_url (str): 要发送请求的URL。
        max_concurrent_requests (int): 最大并发请求数。
        total_requests (int): 要发送的总请求数。
    
    返回:
        tuple: 包含完成 token 总数列表和响应时间列表。
    """
    # 创建 Semaphore 来限制并发请求的数量
    sem = asyncio.Semaphore(max_concurrent_requests)
    
    # 创建一个异步的HTTP会话
    async with aiohttp.ClientSession() as session:
        tasks = []
        
        # 创建一个进度条来可视化请求的进度
        with tqdm(total=total_requests) as pbar:
            # 循环创建任务,直到达到总请求数
            for _ in range(total_requests):
                # 为每个请求创建一个任务,确保它遵守信号量的限制
                task = asyncio.ensure_future(bound_fetch(sem, session, load_url, pbar))
                tasks.append(task)  # 将任务添加到任务列表中
            
            # 等待所有任务完成并收集它们的结果
            results = await asyncio.gather(*tasks)
        
        # 计算所有结果中的完成token总数
        completion_tokens = sum(result[0] for result in results)
        
        # 从所有结果中提取响应时间
        response_times = [result[1] for result in results]
        
        # 返回完成token的总数和响应时间的列表
        return completion_tokens, response_times

if __name__ == '__main__':
    import sys

    if len(sys.argv) != 3:
        print("Usage: python bench.py <C> <N>")
        sys.exit(1)

    C = int(sys.argv[1])  # 最大并发数
    N = int(sys.argv[2])  # 请求总数

    # vllm 和 ollama 都兼容了 openai 的 api 让测试变得更简单了
    url = 'http://10.1.9.167:11434/v1/chat/completions'

    start_time = time.time()
    completion_tokens, response_times = asyncio.run(run(url, C, N))
    end_time = time.time()

    # 计算总时间
    total_time = end_time - start_time
    # 计算每个请求的平均时间
    avg_time_per_request = sum(response_times) / len(response_times)
    # 计算每秒生成的 token 数量
    tokens_per_second = completion_tokens / total_time

    print(f'Performance Results:')
    print(f'  Total requests            : {N}')
    print(f'  Max concurrent requests   : {C}')
    print(f'  Total time                : {total_time:.2f} seconds')
    print(f'  Average time per request  : {avg_time_per_request:.2f} seconds')
    print(f'  Tokens per second         : {tokens_per_second:.2f}')

運行結果1:

  Performance Results:
  Total requests            : 2000
  Max concurrent requests   : 50
  Total time                : 8360.14 seconds
  Average time per request  : 206.93 seconds
  Tokens per second         : 83.43

運行結果2:




显存占用情况:

(base) root@alg-dev17:~# nvidia-smi
Thu Jun 27 16:21:36 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02              Driver Version: 555.42.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti     On  |   00000000:0B:00.0 Off |                  N/A |
| 35%   64C    P2            212W /  250W |    7899MiB /  11264MiB |     83%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      9218      C   ...unners/cuda_v11/ollama_llama_server       7896MiB |
+-----------------------------------------------------------------------------------------+

仅供参照,转载请注明出处!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1872252.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

Application Studio 学习笔记(3)

一、工具栏按钮 1、panel控件添加工具栏按钮 展开panel控件的Advanced属性并点击Action Data&#xff0c;进入Action Data编辑界面 新增Action Data数据&#xff0c;Sequence设定工具按钮的显示顺序 默认工具按钮会显示在弹出工具栏中 勾选Add to Primary ToolBar后&#xff…

电容式单按键触摸检测及接近感应控制芯片 SD8233LQG

SD8233LQG是一款电容式单按键触摸检测及接近感应控制芯片。采用 CMOS 工艺制造&#xff0c;内建稳压和去抖动电路&#xff0c;高可靠性&#xff0c;专为取代传统按键开关而设计。超低功耗与宽工作电压特性&#xff0c;广泛应用于 TWS及 DC 应用的触摸产品中&#xff0c;实现产品…

vue3 中的根据某些特定的文字来筛选数组数据

现在有一批这样的数据 这样的数据 我想根据 hallName 来筛选数据 比如关键字有 我不需要 带有下面字符换的数组数据 const importantData ref(["VIP", "CINITY", "杜比", "IMAX", "4DX", vip, Vip]) 使用some 方法 arr…

笔记-python爬虫概述

目录 常用第三方库 爬虫框架 动态页面渲染1. url请求分析2. selenium3. phantomjs4. splash5. spynner 爬虫防屏蔽策略1. 修改User-Agent2. 禁止cookies3. 设置请求时间间隔4. 代理IP池5. 使用Selenium6. 破解验证码常用第三方库 对于爬虫初学者&#xff0c;建议在了解爬虫原…

FFmpeg中位操作相关的源码:GetBitContext结构体,init_get_bits函数、get_bits1函数和get_bits函数分析

一、引言 由《音视频入门基础&#xff1a;H.264专题&#xff08;3&#xff09;——EBSP, RBSP和SODB》可以知道&#xff0c;H.264 码流中的操作单位是位(bit)&#xff0c;而不是字节。因为视频的传输和存贮是十分在乎体积的&#xff0c;对于每一个比特&#xff08;bit&#xf…

CentOS安装Docker教程(包含踩坑的经验)

目录 一.基础安装 ▐ 安装Docker 二.启动Docker服务 三.配置Docker镜像加速 一.基础安装 在安装Docker之前可能需要先做以下准备 首先如果系统中已经存在旧的Docker&#xff0c;则先卸载&#xff1a; yum remove docker \docker-client \docker-client-latest \docker-…

Modbus转Profibus网关在汽车行业的应用

一、前言 在当前汽车工业的快速发展中&#xff0c;汽车制造商正通过自动化技术实现生产的自动化&#xff0c;目的是提高生产效率和减少成本。Modbus转Profibus网关&#xff08;XD-MDPB100&#xff09;应用于汽车行业&#xff0c;主要体现在提升自动化水平、优化数据传输以及实…

Java8新特性stream的原理和使用

这是一种流式惰性计算&#xff0c;整体过程是&#xff1a; stream的使用也异常方便&#xff0c;可以对比如List、Set之类的对象进行流式计算&#xff0c;挑出最终想要的结果&#xff1a; List<Timestamp> laterTimes allRecords.stream().map(Record::getTime).filter…

第11章 规划过程组(收集需求)

第11章 规划过程组&#xff08;一&#xff09;11.3收集需求&#xff0c;在第三版教材第375~378页&#xff1b; 文字图片音频方式 第一个知识点&#xff1a;工具与技术 1、决策 投票 用于生成、归类和排序产品需求 独裁型决策制定 一个人负责为整个集体制定决策 多标准决策分析…

软件需求管理规程(DOC原件)

软件需求管理规程是确保软件开发过程中需求清晰、一致、可追踪的关键环节&#xff1a; 明确需求&#xff1a;项目初期&#xff0c;与利益相关者明确项目目标和需求&#xff0c;确保需求完整、无歧义。需求评审&#xff1a;组织专家团队对需求进行评审&#xff0c;识别潜在风险和…

DM达梦数据库存储过程

&#x1f49d;&#x1f49d;&#x1f49d;首先&#xff0c;欢迎各位来到我的博客&#xff0c;很高兴能够在这里和您见面&#xff01;希望您在这里不仅可以有所收获&#xff0c;同时也能感受到一份轻松欢乐的氛围&#xff0c;祝你生活愉快&#xff01; &#x1f49d;&#x1f49…

Arduino - OLED

Arduino - OLED Arduino - OLED Arduino通过u8g2库驱动OLEDU8g2 驱动oled自定义中文字库 The OLED (Organic Light-Emitting Diode) display is an alternative for LCD display. The OLED is super-light, almost paper-thin, flexible, and produce a brighter and crisper…

【乐吾乐2D可视化组态编辑器】图形库、组件库

15 图形库 15.1 图纸 新建文件夹、新建图纸、删除文件夹、删除图纸 15.2 系统组件 乐吾乐图形库一共分为三大类&#xff1a;基础图形库、电力图形库、物联网图形库、2.5D图形库&#xff0c;总共约4000个图元&#xff0c;能满足大部分行业的基本需求。 格式有三种&#xff1a…

智慧法务引领:构筑数字化法治核心,塑造未来企业竞争力

在全球化及信息化时代背景下&#xff0c;企业面临的法律环境越来越复杂&#xff0c;法治数字化成为企业维护合法权益、提升市场竞争力的必然选择。智慧法务管理系统作为推动企业法治数字化转型的重要工具&#xff0c;不仅提高了法律服务效率&#xff0c;而且加强了企业的法律风…

百问网全志D1h开发板投屏功能实现

投屏功能实现 D1系列号称点屏神器&#xff0c;不仅能点屏&#xff0c;还能用于投屏。 源码准备 百问网为 【百问网D1h开发板】提供了投屏功能需要使用的源码&#xff0c;直接git下载即可&#xff1a; git clone https://github.com/DongshanPI/DongshannezhaSTU_DLNA_Scree…

MoneyPrinterPlus:AI自动短视频生成工具-微软云配置详解

MoneyPrinterPlus可以使用大模型自动生成短视频&#xff0c;我们可以借助Azure提供的语音服务来实现语音合成和语音识别的功能。 Azure的语音服务应该是我用过的效果最好的服务了&#xff0c;微软还得是微软。 很多小伙伴可能不知道应该如何配置&#xff0c;这里给大家提供一…

沉淀强化镍基合金660大螺丝的物理性能

沉淀强化镍基合金660大螺丝&#xff0c;是一种高性能的工程材料&#xff0c;其在极端环境中展现了优异的稳定性和耐用性。以下&#xff0c;我们将深入解析其主要的物理性能。 首先&#xff0c;该合金螺丝的密度为7.99g/cm&#xff0c;这意味着它具有较高的质量密度&#xff0c;…

力扣随机一题 6/28 数组/矩阵

&#x1f4dd;个人主页&#x1f339;&#xff1a;誓则盟约⏩收录专栏⏪&#xff1a;IT 竞赛&#x1f921;往期回顾&#x1f921;&#xff1a;6/27 每日一题关注博主&#xff0c;后期持续更新系列文章如果有错误感谢请大家批评指出&#xff0c;及时修改感谢大家点赞&#x1f44d…

GEOS学习笔记(一)

下载编译GEOS 从Download and Build | GEOS (libgeos.org)下载geos-3.10.6.tar.bz2 使用cmake-3.14.0版本配置VS2015编译 按默认配置生成VS工程文件 编译后生成geos.dll&#xff0c;geos_c.dll 后面学习使用C接口进行编程

MySQL中的常用逻辑操作符

逻辑运算符在MySQL查询中扮演着重要角色&#xff0c;通过AND、OR、NOT等运算符的组合使用&#xff0c;可以提高查询的准确性和灵活性&#xff0c;确保查询结果满足业务需求。合理使用这些运算符还能优化查询性能&#xff0c;减少不必要的数据检索&#xff0c;并提高SQL语句的可…