A Survey on Mixture of Experts 混合专家模型综述(第二部分:混合专家系统设计)

news2025/4/21 2:28:03

 A Survey on Mixture of Experts 混合专家模型综述

(第一部分:混合专家算法设计)


A Survey on Mixture of Experts

arxiv

github:A-Survey-on-Mixture-of-Experts-in-LLMs

5  System Design of Mixture of Experts

While ​Mixture of Experts (MoE) has been increasingly leveraged to enhance the capabilities of large language models, its adoption introduces new challenges to existing training and inference systems, due to the inherently ​sparse and dynamic nature of its computational workload. ​GShard [28] introduces ​expert parallelism that implements parallel gating and expert computation by dispatching partitioned local tokens with load balancing limit of expert capacity. Since then, expert parallelism has emerged as a fundamental strategy to facilitate efficient scaling of MoE models. This approach can be viewed as an augmentation of ​data parallelism [197], [198], [199], where each expert in an MoE layer is assigned to a distinct device, while all non-expert layers are duplicated across devices. As depicted in Figure 8(a), the process flow of expert parallelism consists of the following sequential operations: ​gate routing, input encode, All-to-All dispatch, expert computation, All-to-All combine, and output decode. In general, the input size for ​general matrix multiply (GEMM) needs to be large enough to achieve optimal utilization and throughput that computing device necessitates. Therefore, input encode is employed to aggregate the input tokens of a same expert into a contiguous memory space, as determined by the token-expert mapping from gate routing. Subsequently, the ​All-to-All dispatch is employed to send the input tokens to their corresponding experts across the distributed devices. Following the localized computation by the experts, the inverse process—All-to-All combine and output decode—reinstates the original data layout according to the gating indices.

Furthermore, the synergy of ​expert parallelism [36], [132], [135], [200], [201] with other existing parallel strategies (tensor [202], [203], [204], pipeline [205], [206], [207], sequence parallelism [208], [209], [210]) has been investigated to enhance the scalability and efficiency of MoE models in large-scale distributed environments. As shown in Figure 8, we illustrate several examples of hybrid parallelism, encompassing (b) ​data + expert + tensor parallelism [36], [66], [132], [135], [138], (c) ​data + expert + pipeline parallelism [132], [134], [138], and (d) ​expert + tensor parallelism [67]. It is imperative to recognize that the choice of distributed parallelism strategies influences a complex interplay between ​computation efficiency, communication overhead, memory occupation, potentially affected by various hardware configurations. Consequently, the deployment strategies for practical applications necessitate nuanced trade-offs and bespoke designs tailored to specific use-case scenarios.

In the subsequent discussion, we delineate the challenges introduced by MoE models from ​computation, communication, and storage aspects, concurrently reviewing existing research addressing these issues. ​Table 4 shows an overview of the open-source MoE frameworks.

尽管混合专家模型(Mixture of Experts, MoE)​越来越多地被用于增强大语言模型的能力,但其固有的稀疏性和动态计算特性给现有的训练和推理系统带来了新的挑战。​GShard [28]引入了专家并行性(expert parallelism)​,通过分发分区的本地 token 并限制专家容量来实现并行门控(parallel gating)​和专家计算。自此,专家并行性成为促进 MoE 模型高效扩展的基本策略。这种方法可以被视为数据并行性(data parallelism) [197], [198], [199]的增强,其中 MoE 层中的每个专家被分配到不同的设备,而非专家层则在所有设备上复制。如图 8(a) 所示,专家并行性的流程包括以下顺序操作:​门控路由(gate routing)、输入编码(input encode)、全局分发(All-to-All dispatch)、专家计算(expert computation)、全局聚合(All-to-All combine)​输出解码(output decode)​。通常,​通用矩阵乘法(General Matrix Multiply, GEMM)​的输入需要足够大才能实现计算设备所需的最佳利用率和吞吐量。因此,输入编码用于将同一专家的输入 token 聚合到连续的内存空间中,这是由门控路由的 token-专家映射决定的。随后,​全局分发用于将输入 token 发送到分布式设备上对应的专家。在专家完成本地计算后,逆过程——全局聚合和输出解码——根据门控索引恢复原始数据布局。

此外,​专家并行性 [36], [132], [135], [200], [201]与其他现有并行策略(张量并行性(tensor parallelism) [202], [203], [204]、流水线并行性(pipeline parallelism) [205], [206], [207]、序列并行性(sequence parallelism) [208], [209], [210])的协同作用已被研究,以增强 MoE 模型在大规模分布式环境中的可扩展性和效率。如图 8 所示,我们展示了混合并行性(hybrid parallelism)​的几个示例,包括 (b) ​数据 + 专家 + 张量并行性 [36], [66], [132], [135], [138],(c) ​数据 + 专家 + 流水线并行性 [132], [134], [138],以及 (d) ​专家 + 张量并行性 [67]。必须认识到,分布式并行策略的选择会影响计算效率、通信开销和内存占用之间的复杂相互作用,这些因素可能受到各种硬件配置的影响。因此,实际应用的部署策略需要根据特定用例场景进行细致的权衡和定制设计。

在接下来的讨论中,我们从计算、通信和存储方面梳理了 MoE 模型引入的挑战,并回顾了解决这些问题的现有研究。​表 4展示了开源 MoE 框架的概述。



  5.1 Computation  

Despite MoE is designed to scale model parameters efficiently without increasing computational demand, it encounters challenges pertaining to ​computational efficiency. One concern is the ​imbalance of computational load across distributed devices employing expert parallelism, which incurs significant ​synchronization overhead as the system awaits the processing completion of the most heavily loaded expert. Such issues are typically addressed through algorithmic strategies, such as ​optimized gating mechanisms and expert capacity adjustments, as discussed in Section 4.1. Besides, solutions like ​SE-MoE [133], Tutel [132], FlexMoE [137] and ​SmartMoE [138] have introduced ​dynamic expert placement strategies to distribute the workload as equally as possible among devices. Additionally, ​FasterMoE [134] has implemented a novel ​dynamic shadowed expert strategy, replicating experts on multiple devices to mitigate severe load imbalance. These model placement related strategies impact both ​computation and communication efficiency.

Another concern is that MoE introduces additional ​computational overhead through operations including ​gate routing, input encode, and output decode. Unlike expert computations, which mirror operations in dense models and benefit from extensive optimization on prevalent hardware such as GPUs, these MoE operations are characterized by ​redundant computation and memory movement, resulting in low efficiency on computing devices. Therefore, recent studies like ​DeepSpeed-MoE [66], FastMoE [131], HetuMoE [136] and ​Tutel [132] have focused on the development of ​tailored GPU kernels to enhance the efficiency of MoE operations.

In contexts where multiple experts are deployed on a single GPU device, ​MegaBlocks [139] reformulates MoE computation in terms of ​block-sparse operations, developing specialized ​block-sparse GPU kernels that efficiently handle the dynamic workloads without dropping tokens. ​Zheng et al. [141] propose ​PIT, a deep-learning compiler tailored for ​dynamic sparsity of MoE, which can find feasible ​PIT rules for all the operators within a model and generate optimized GPU kernels for them. PIT employs a novel ​tiling mechanism, utilizing the ​Permutation Invariant Transformation (PIT)—–a mathematically proven property—to transform multiple sparsely located micro-tiles into a GPU-efficient dense tile without changing the computation results, thereby achieving both ​high GPU utilization and low coverage waste. Despite these advancements, ​Tan et al. [140] highlight remaining optimization potential within current MoE frameworks such as ​MegaBlocks and PIT, which commence with an initial ​scatter-to-group data copy that increases ​memory footprint and requires a translation of the MoE problem into the ​sparse matrix format. Although this translation contributes minimally to computation overhead, it imposes limitations on the ​transparency and adaptability of extending MegaBlocks to modules beyond the FFN. To address these issues, ​Tan et al. [140] propose ​ScatterMoE, a MoE implementation designed to effectively minimize the ​memory footprint. ScatterMoE leverages ​ParallelLinear, a linear module capable of executing ​grouped matrix operations on scattered groups. This approach yields ​intermediate representations (e.g., the hidden states of an SMoE MLP) that are directly accessible as standard ​PyTorch tensors, allowing for easy extensions of MoE methods to other types of expert modules.

5.1 计算

尽管 MoE 旨在高效扩展模型参数而不增加计算需求,但它仍面临与计算效率相关的挑战。一个问题是使用专家并行性的分布式设备之间计算负载的不均衡,这会导致显著的同步开销,因为系统需要等待负载最重的专家完成处理。这些问题通常通过算法策略来解决,例如优化门控机制和专家容量调整,如第 4.1 节所述。此外,​SE-MoE [133]、Tutel [132]、FlexMoE [137]SmartMoE [138]等解决方案引入了动态专家放置策略,以尽可能均衡地分配设备间的工作负载。此外,​FasterMoE [134]实现了一种新颖的动态影子专家策略(dynamic shadowed expert strategy)​,通过在多个设备上复制专家来缓解严重的负载不均衡。这些与模型放置相关的策略同时影响计算和通信效率

另一个问题是,MoE 通过门控路由、输入编码和输出解码等操作引入了额外的计算开销。与专家计算(其操作与密集模型类似,并受益于 GPU 等硬件的广泛优化)不同,这些 MoE 操作的特点是冗余计算和内存移动,导致计算设备的效率低下。因此,​DeepSpeed-MoE [66]、FastMoE [131]、HetuMoE [136]Tutel [132]等最新研究专注于开发定制的 GPU 内核,以提高 MoE 操作的效率。

在多个专家部署在单个 GPU 设备上的场景中,​MegaBlocks [139]将 MoE 计算重构为块稀疏操作(block-sparse operations)​,开发了专门的块稀疏 GPU 内核,能够高效处理动态工作负载而不会丢弃 token。​Zheng 等人 [141]提出了PIT,这是一个针对 MoE ​动态稀疏性定制的深度学习编译器,它可以为模型中的所有算子找到可行的PIT 规则并为其生成优化的 GPU 内核。PIT 采用了一种新颖的分块机制,利用置换不变变换(Permutation Invariant Transformation, PIT)​——一种数学上证明的特性——将多个稀疏分布的微块转换为 GPU 高效的密集块,而不改变计算结果,从而实现高 GPU 利用率和低覆盖浪费。尽管取得了这些进展,​Tan 等人 [140]指出,当前 MoE 框架(如MegaBlocks 和 PIT)仍存在优化潜力,这些框架从初始的分散到分组数据复制开始,增加了内存占用,并需要将 MoE 问题转换为稀疏矩阵格式。尽管这种转换对计算开销的贡献很小,但它限制了将MegaBlocks扩展到 FFN 之外模块的透明性和适应性。为了解决这些问题,​Tan 等人 [140]提出了ScatterMoE,这是一种旨在有效最小化内存占用的 MoE 实现。ScatterMoE 利用了ParallelLinear,这是一个能够对分散的组执行分组矩阵运算的线性模块。这种方法生成的中间表示​(例如 SMoE MLP 的隐藏状态)可以直接作为标准PyTorch 张量访问,从而可以轻松将 MoE 方法扩展到其他类型的专家模块。



  5.2 Communication  

In expert parallelism, the ​quadruple invocation of All-to-All communication during both the forward and backward propagation phases within each MoE layer causes a significant overhead, even emerging as the ​primary constraint on efficiency. The ​All-to-All communication paradigm encompasses both ​intra-node (via PCIe, pre-4th-generation NVLink) and ​inter-node (Ethernet, Infiniband, 4th-generation NVLink) communication channels. The efficiency of such communication is contingent upon a multitude of factors, including the ​heterogeneity of channel bandwidths, network topology, and the collective communication algorithms. Moreover, ​load imbalances intrinsic to MoE may exacerbate these inefficiencies by inducing ​synchronization delays.

To optimize the use of ​high intra-node bandwidth and ​low inter-node bandwidth, ​DeepSpeed-MoE [66], HetuMoE [136] and ​ScheMoE [147] have introduced ​hierarchical All-to-All communication strategies that enhance ​intra-node process and reduce ​inter-node data exchanges. Besides, ​FasterMoE [134], TA-MoE [143] and ​SE-MoE [133] have introduced ​topology-aware routing strategies aimed at mitigating ​cross-node expert selection, thereby reducing ​inter-node communication burdens. Additionally, ​ExFlow [142] exploits ​expert affinity, anticipating expert allocation across layers to maximize the retention of token processing within ​local GPU confines. The ​strategic allocation of experts to minimize network traffic and leverage ​high-bandwidth connections is a prevalent approach in ​distributed MoE system [66], [67], [135]. Moreover, this is often integrated with the ​placement design of non-expert modules to optimize overall system performance.

Given the ​concurrent feature of communication and computation, ​pipelining [205], [206], [207] is commonly employed to ​overlap their execution, thereby reducing the ​total time cost. This technique, which is integrated in systems such as ​Tutel [132], FasterMoE [134], PipeMoE [146] and ​MPipeMoE [144], orchestrates overlapping between ​All-to-All communication and ​expert computation. Notably, ​Lancet [145] underscores the ​inherent constraints of these pipelining methods, particularly the ​bounded duration for which ​expert computation and communication can overlap. To address this limitation, ​Lancet partitions ​non-MoE computations and integrates them into the pipeline during ​forward pass, and strategically schedules ​gradient weight computations to augment overlap in the ​backward pass. ​Punniyamurthy et al. [148] also emphasize the challenge posed by ​collective communications, which are often on the ​critical path, noting the difficulty of hiding their latency by overlapping ​kernel-granular communication and computation due to the absence of ​independent computation. Their solution involves ​fusing computation with ​dependent collective communication by leveraging ​GPU’s massive parallelism and ​GPU-initiated communication.

Aiming to break the ​inherent dependencies and thereby extend the ​overlap duration, ​ScMoE [110] restructures the MoE architecture to ​simultaneously process representations from preceding layers while engaging with ​current-layer representations. This ​decoupling of communication dependencies facilitates substantial, and in certain cases, ​complete overlapping between communication and computation. ​Snowflake Arctic [32] employs a similar design, utilizing a ​Dense-MoE hybrid transformer architecture to overlap communication with computation.

5.2 通信

在专家并行性中,​每个 MoE 层的前向和反向传播阶段All-to-All 通信的四重调用导致了显著的开销,甚至成为效率的主要限制因素。​All-to-All 通信范式涵盖了节点内(通过 PCIe、第四代之前的 NVLink)​节点间(以太网、Infiniband、第四代 NVLink)​的通信通道。这种通信的效率取决于多种因素,包括通道带宽的异构性、网络拓扑以及集体通信算法。此外,MoE 固有的负载不均衡可能通过同步延迟加剧这些低效问题。

为了优化高节点内带宽低节点间带宽的使用,​DeepSpeed-MoE [66]、HetuMoE [136]ScheMoE [147]引入了分层 All-to-All 通信策略,以增强节点内处理并减少节点间数据交换。此外,​FasterMoE [134]、TA-MoE [143]SE-MoE [133]引入了拓扑感知路由策略,旨在减少跨节点专家选择,从而减轻节点间通信负担。另外,​ExFlow [142]利用专家亲和性,预测跨层的专家分配,以最大化在本地 GPU 范围内保留 token 处理。​将专家分配到最小化网络流量并利用高带宽连接是分布式 MoE 系统中的常见方法 [66], [67], [135]。此外,这通常与非专家模块的放置设计集成,以优化整体系统性能。

鉴于通信和计算的并发特性,​流水线化 [205], [206], [207]通常被用于重叠它们的执行,从而减少总时间成本。这种技术在Tutel [132]、FasterMoE [134]、PipeMoE [146]MPipeMoE [144]等系统中集成,协调了All-to-All 通信专家计算之间的重叠。值得注意的是,​Lancet [145]强调了这些流水线方法的固有约束,特别是专家计算和通信可以重叠的有限时间。为了解决这一限制,​Lancet非 MoE 计算分区并将其集成到前向传播的流水线中,并策略性地调度梯度权重计算以增加反向传播中的重叠。​Punniyamurthy 等人 [148]也强调了集体通信带来的挑战,这些通信通常位于关键路径上,并指出由于缺乏独立计算,很难通过重叠内核粒度的通信和计算来隐藏其延迟。他们的解决方案是通过利用GPU 的大规模并行性GPU 发起的通信,将计算与依赖的集体通信融合。

为了打破固有依赖关系并延长重叠时间,​ScMoE [110]重构了 MoE 架构,以在处理当前层表示的同时处理来自前一层的表示。这种通信依赖关系的解耦促进了通信和计算之间的大幅重叠,在某些情况下甚至完全重叠。​Snowflake Arctic [32]采用了类似的设计,利用Dense-MoE 混合 Transformer 架构来重叠通信与计算。



  5.3 Storage  

The ​ever-increasing parameters in MoE models exacerbate the constraints posed by ​memory capacity in compute devices, a challenge already pronounced in ​dense models. While ​expert parallelism offers a mitigation strategy through the ​distribution of experts across multiple devices, individual devices may still struggle to accommodate ​numerous experts, particularly in ​inference contexts where device capacity—–such as that of ​edge devices (PCs, smartphones, IoTs)–—is inherently more restricted.

Considering the ​hierarchical storage pyramid, solutions like ​SE-MoE [133], Pre-gated MoE [149], and ​EdgeMoE [150] selectively retain only ​essential non-expert parameters and the ​active expert parameters within the GPU’s ​HighBandwidth Memory (HBM), offloading ​inactive expert parameters to ​CPU memory or ​SSDs. These patterns incur additional overhead from ​data transfer across the storage hierarchy, thus they integrate ​expert selection forecasting and ​expert parameter prefetching techniques to overlap ​parameter access with computation.

In addition, ​MPipeMoE [144] introduces a strategy to reduce the ​memory overhead associated with ​activations and ​temporary buffers. This is achieved by ​sharing buffer for various partitions of tensors, while leveraging ​recomputation/communication and ​CPU offloading to recover the ​requisite activations in the ​backward pass.

5.3 存储  

MoE 模型中不断增加的参数加剧了计算设备中内存容量的限制,这一挑战在密集模型中已经十分突出。虽然专家并行性通过将专家分布到多个设备提供了一种缓解策略,但单个设备可能仍然难以容纳多个专家,特别是在推理场景中,设备的容量(如边缘设备(PC、智能手机、IoT))本身更加有限。

考虑到分层存储金字塔,​SE-MoE [133]、Pre-gated MoE [149]EdgeMoE [150]等解决方案选择性地仅在 GPU 的高带宽内存(HBM)​中保留必要的非专家参数活跃的专家参数,将非活跃的专家参数卸载到 CPU 内存或 SSD 中。这些模式会因跨存储层次的数据传输而产生额外开销,因此它们集成了专家选择预测专家参数预取技术,以重叠参数访问与计算。

此外,​MPipeMoE [144]引入了一种策略,以减少与激活临时缓冲区相关的内存开销。这是通过共享各种张量分区的缓冲区实现的,同时利用重计算/通信CPU 卸载来恢复反向传播中所需的激活。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2315515.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

docker python:latest镜像 允许ssh远程

跳转到家目录 cd创建pythonsshdockerfile mkdir pythonsshdockerfile跳转pythonsshdockerfile cd pythonsshdockerfile创建Dockerfile文件 vim Dockerfile将Dockerfile的指令复制到文件中 # 使用 python:latest 作为基础镜像 # 如果我的镜像列表中没有python:latest镜像&…

Aim Robotics电动胶枪:机器人涂胶点胶的高效解决方案

在自动化和智能制造领域,机器人技术的应用越来越广泛,而涂胶和点胶作为生产过程中的重要环节,也逐渐实现了自动化和智能化。Aim Robotics作为一家专注于机器人技术的公司,其推出的电动胶枪为这一领域带来了高效、灵活且易于操作的…

【HDLbits--分支预测器简单实现】

HDLbits--分支预测器简单实现 1 timer2.branche predicitors3.Branch history shift4.Branch direction predictor 以下是分支预测器的简单其实现; 1 timer 实现一个计时器,当load1’b1时,加载data进去,当load1’b0时进行倒计时&…

Linux--操作系统/进程

ok,我们今天学习linux中的操作系统和进程 1. 冯诺依曼体系 我们常⻅的计算机,如笔记本。我们不常⻅的计算机,如服务器,⼤部分都遵守冯诺依曼体系。 内存是CPU和外设之间的一个巨大的缓存! 截⾄⽬前,我们…

Java面试八股—Redis篇

一、Redis的使用场景 (一)缓存 1.Redis使用场景缓存 场景:缓存热点数据(如用户信息、商品详情),减少数据库访问压力,提升响应速度。 2.缓存穿透 正常的访问是:根据ID查询文章&…

Web后端开发之Maven

Maven Mven是apache旗下的一个开源项目,用来管理和构建java项目的工具。 通过一小段描述信息来管理项目。 Maven的作用 1.依赖管理:方便快捷的管理项目依赖的资源(jar包),避免版本冲突问题 以前用某个jar包需要下载…

there are no enabled repos

我做了两个操作 第一个操作: 1.先在本地电脑,也就是在我们电脑的桌面上下载 https://repo.huaweicloud.com/repository/conf/CentOS-7-reg.repo 2.在CentOS 创建etc文件夹 3在etc文件夹内创建yum.repos.d文件夹 4.将下载好的repo 黏贴到yum.repos.d…

OpenEuler-22.03-LTS上利用Ansible轻松部署MySQL 5.7

一、需求 使用ansible自动化部署mysql二进制部署mysql部署mysql并创建JDBC用户 二、环境信息 本文涉及的代码,配置文件地址: 链接:百度网盘 请输入提取码 提取码:1g6y 软件名称版本备注Ansible2.9.27All modules — Ansible Doc…

前端无限滚动内容自动回收技术详解:原理、实现与优化

文章目录 一、核心需求与技术挑战1.1 无限滚动的问题症结1.2 自动回收的三大目标 二、技术实现原理2.1 虚拟滚动核心机制2.2 关键技术指标 三、完整实现方案3.1 基础HTML结构3.2 CSS关键样式3.3 JavaScript核心逻辑3.3.1 滚动控制器3.3.2 动态尺寸处理 四、性能优化策略4.1 内存…

如何在Ubuntu上构建编译LLVM和ISPC,以及Ubuntu上ISPC的使用方法

之前一直在 Mac 上使用 ISPC,奈何核心/线程太少了。最近想在 Ubuntu 上搞搞,但是 snap 安装的 ISPC不知道为什么只能单核,很奇怪,就想着编译一下,需要 Clang 和 LLVM。但是 Ubuntu 很搞,他的很多软件版本是…

【MySQL】表的约束(上)

文章目录 表的约束什么是表的约束空属性默认值列描述(comment)零填充(zerofill)主键 总结 表的约束 什么是表的约束 表的约束(Constraints)是数据库表中的规则,用于限制存储的数据&#xff0c…

静态分析技术:Jadx-GUI高级用法与模式识别

1. 深度反编译策略 1.1 多层级反混淆方案 代码恢复流程: graph TD A[混淆代码] --> B{符号恢复} B -->|字典匹配| C[变量重命名] B -->|类型推导| D[参数重构] C --> E[控制流优化] D --> E E --> F[语义化输出] 反混淆脚本示例&…

30天学习Java第六天——Object类

Object类 java.lang.Object时所有类的超类。Java中所有类都实现了这个类中的方法。 toString方法 将Java对象转换成字符串的表示形式。 public String toString() {return getClass().getName() "" Integer.toHexString(hashCode()); }默认实现是:完…

【C语言】编译和链接详解

hi,各位,让我们开启今日份博客~ 小编个人主页点这里~ 目录 一、翻译环境和运行环境1、翻译环境1.1预处理(预编译)1.2编译1.2.1词法分析1.2.2语法分析1.2.3语义分析 1.3汇编1.4链接 2.运行环境 一、翻译环境和运行环境 在ANSI C…

DataWhale 速通AI编程开发:(基础篇)第1章 环境下载、安装与配置

课程地址:Datawhale-学用 AI,从此开始 vscode 更新为最新版 目前绝大多数deepseek非官方渠道均兼容openai的api格式,这里以硅基流动为例进行演示,其他非官方渠道同理。 点击链接注册账号之后,点击“实名认证“完成实名&#xff0…

本地知识库RAG总结

目录 RAG流程: 知识库的要求: 知识抽取: 知识存储: 向量化: 知识检索: 应用客户端: RAG智能问答应用几个痛点: 如何提升召回率改进思路: 如何提升回答专业性: RAG评测: 总结: 参考…

torch_geometric 安装

环境监测: import torch print(torch.__version__) # 查看pytorch安装的版本号 print(torch.cuda.is_available()) # 查看cuda是否可用。True为可用,即是gpu版本pytorch print(torch.cuda.get_device_name(0)) # 返回GPU型号 …

网页打印很简单!用web打印插件lodop轻松实现文件打印

最近,给客户发一个事件提醒软件,其中客户要求实现打印功能,因为是用asp.net mvc 开发首先考虑到用水晶报表来实现(crystalReport),以前开发c# winform程序,感觉水晶报表还是蛮好的,但…

北京迅为iTOP-RK3568开发板OpenHarmony系统南向驱动开发实操-HDF驱动配置LED

瑞芯微RK3568芯片是一款定位中高端的通用型SOC,采用22nm制程工艺,搭载一颗四核Cortex-A55处理器和Mali G52 2EE 图形处理器。RK3568 支持4K 解码和 1080P 编码,支持SATA/PCIE/USB3.0 外围接口。RK3568内置独立NPU,可用于轻量级人工…

驻场运维服务方案书(Word文件)

目 录 第一章 背景分析 1.1. 项目背景 1.2. 项目目标 1.3. 系统现状 1.3.1. 网络系统 1.3.2. 设备清单梳理 1.3.3. 应用系统 第二章 需求分析及理解 2.1. 在重要日期能保障信息系统安全 2.2. 信息系统可长期安全、持续、稳定的运行 2.3. 提升发现安全问题、解决安全…