CUDA系列-Mem-9

news2025/1/23 12:19:24

这里写目录标题

  • Static Architecture
    • .Abstractions provided by CUSW_UNIT_MEM_MANAGER
      • Memory Object (CUmemobj)
    • Memory Descriptor(CUmemdesc)
    • Memory Block(CUmemblock)
      • Memory Bins
      • Suballocations in Memory Block
      • Functional description
    • Memory Manager

你可能觉得奇怪,这些不是cuda programming model中的内容啊,其实这是cuda runtimes ,还记得那份泄漏出来的代码吗?
This section describes static aspects of the CUSW_UNIT_MEM_MANAGER unit’s architecture.


Static Architecture

The CUSW_UNIT_MEM_MANAGER provides abstractions for allocating memory by other units of CUDA driver or from user space applications through CUSW_UNIT_CUDART. It also provides abstractions to share the memory allocated in the Host with the Device.

CUDA driver sees memory allocations in the form of “Memory objects”. A Memory object represents the chunk of memory allocated. It abstracts all layers involved in the memory management and provides APIs to other units of CUDA driver to allocate/free and map/unmap memory along with other auxiliary functionalities.

To avoid fragmentation of memory, the memory objects are allocated from a bigger fixed size “Memory Block”. Each Memory block has a “Memory Descriptor” which contains all the memory attributes associated with a memory block. The “Memory Manager” maintains all the memory blocks which are allocated in a given CUDA context.

The diagram below shows how CUSW_UNIT_MEM_MANAGER interfaces with different units of CUDA driver. Since memory management needs support from the underlying drivers, CUSW_UNIT_MEM_MANAGER depends on the NVRM driver to accomplish its tasks in memory management. In line with the top-level architectural philosophy (refer to the CUDA Architecture document), the unit also contains parts of Hardware Abstraction Layer (HAL) and Driver Model Abstraction layer (DMAL), which it uses internally.
在这里插入图片描述

.Abstractions provided by CUSW_UNIT_MEM_MANAGER

CUSW_UNIT_MEM_MANAGER primarily consists of three types of abstractions. The Memory object, Memory Descriptor and the Memory Manager. This section describes each abstraction in detail.

Memory Object (CUmemobj)

A memory object represents a memory allocation. It contains the size, device virtual address, host virtual address and other related information about a memory allocation. It is also possible for one context to share memory with other, in which case the memory object also contains the sharing information. Memory object abstracts all underlying implementation and is the only way for other units in the CUDA driver to interface with CUSW_UNIT_MEM_MANAGER.
Functional description
Memory Object abstraction provides functionalities to

Map/unmap an existing memory object to host memory.

Get parent block’s CUmemflags/CUmemdesc for a given memory object.

To get the data needed to share given memory object with another context.

To get the absolute device virtual address of memory object.

To get the memory object for a given Virtual Address or Range.

To get the user-visible device pointer/host pointer for a given memory object and vice versa.

To get the logical byte size of a given memory object.

To get the context of a given memory object.

To get the shared instance of a given memory object.

To check if the memory allocated in host/device and its associated cache attributes.

To check if the memory allocated is pinned/managed memory.

To execute the given cache operation on cpu side on the memory region pointed by given memory object.

To Mark/Unmark the given memory object to ensure that synchronous memory operations on the given memory object are always fully synchronous.

Memory Descriptor(CUmemdesc)

Memory descriptor contains memory attributes associated with a memory allocation.

While requesting for memory, the other units of CUDA driver can provide specific attributes for the memory allocation request. To get/set the attributes for a memory allocation, other units of the CUDA driver can use the interfaces provided by the Memory object.

Memory Block(CUmemblock)

Memory block is a superblock which encapsulates the physical allocation.

Each Memory Block is associated with a Memory Descriptor as explained above which describes the attributes of the memory block. It contains a OS specific structure(dmal) to hold OS driver specific handles and data. The size of a particular memory block is based on specified memory type(like pushbuffer, generic etc) and arch specific requirements for the MMU. Generally the size of a memory block is the size of HUGE Page supported by the Device. Memory block can be created with shared allocation from another context.

Memory Bins

An effective method of avoiding fragmentation of memory is by grouping similar sized allocations together. Since the size of the memory block is big, sub-allocations can be made inside a memory block. Memory bins try to achieve that by creating bins of varying size and associating each memory block with it. For example, 5 bins are created with size 1KB, 4KB, 16KB, 64KB and 256KB during initialization of memory manager. During a memory allocation request when a new memblock is created, based on the allocation request size one of the above memory bin is assigned to the memblock. Future similar sized allocations requests are serviced by suballocating from that memblock.

Suballocations in Memory Block

During a memory allocation request, efforts are made to find an already existing suitable Memory Block which has the same memory attributes(in the form of Memory descriptor) as that of the incoming request and has free area in which the requested size worth of memory can be safely allocated. If such a memory block is found and it belongs to the same memory bin as that of requested size then memory will be sub allocated in that block and a memory object is created to represent the same and returned, otherwise new block is created.

Functional description

Memory Block abstraction provides some functionalities which are used internally and not exposed to components outside CUDA memory manager.

Alloc/Free memblock

Allocate UVA for a given memblock

Map/Unmap given memblock to host

Map/Unmap given memblock to device

To get memory range based on the kind of device mapping chosen

To get the UVA address for a given memblock

Memory Manager

Memory allocations happen within a CUDA Context. So each CUDA context has an instance of Memory Manager which has data structures to track all memory allocations done in a given CUDA context. It also provides synchronization primitives to protect the common data structures from concurrent access.

CUSW_UNIT_MEM_MANAGER maintains different Virtual Address regions within which the VA is assigned for a particular memory allocation request. The choice of the VA region depends mainly on page size and the requested memory type.

The available memory ranges are

Dptr32 - It is the 32 bit device pointer range to which memory is mapped for special allocations that need 32 bit addresses due to some H/W unit requirements

Function Memory - Function memory range for CUDA GPU function code

Small Page Region - If the requested page size is 4KB then allocations are made in this region

Big Page Region - If the requested page size is device specific page size (64KB or 128KB depending on device architecture) then allocations are made in this region.

The class diagram below depicts the relationship between various data types involved in memory management in cuda driver.
在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1840586.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

MacOS之解决:开盖启动问题(七十四)

简介: CSDN博客专家,专注Android/Linux系统,分享多mic语音方案、音视频、编解码等技术,与大家一起成长! 优质专栏:Audio工程师进阶系列【原创干货持续更新中……】🚀 优质专栏:多媒…

LSTM架构的演进:LSTM、xLSTM、LSTM+Transformer

文章目录 1. LSTM2. xLSTM2.1 理论介绍2.2 代码实现 3. LSTMTransformer 1. LSTM 传统的 LSTM (长短期记忆网络) 的计算公式涉及几个关键部分:输入门、遗忘门、输出门和单元状态。 2. xLSTM xLSTM之所以称之为xLSTM就是因为它将LSTM扩展为多个LSTM的变体&#xff…

Spring的自动注入(也称为自动装配)

自动注入(也称为自动装配)是Spring框架中的一个核心概念,它与手动装配相对立,提供了一种更简洁、更灵活的方式来管理Bean之间的依赖关系。 在Spring应用程序中,如果类A依赖于类B,通常需要在类A中定义一个类…

终极版本的Typora上传到博客园和csdn

激活插件 下载网址是这个: https://codeload.github.com/obgnail/typora_plugin/zip/refs/tags/1.9.4 解压之后这样的: 解压之后将plugin,复制到自己的安装目录下的resources 点击安装即可: 更改配置文件 "dependencies&q…

SSMP整合案例

黑马程序员Spring Boot2 文章目录 1、创建项目1.1 新建项目1.2 整合 MyBatis Plus 2、创建表以及对应的实体类2.1 创建表2.2 创建实体类2.2.1 引入lombok,简化实体类开发2.2.2 开发实体类 3、数据层开发3.1 手动导入两个坐标3.2 配置数据源与MyBatisPlus对应的配置3…

第1讲:创建vite工程,使用框架为Vanilla时,语言是typescript,修改http端口的方法

直接在项目根目录创建 vite.config.ts文件。 在该文件中添加内容: import { defineConfig } from vite;export default defineConfig({server: {port: 7777,}, });最后尝试运行package.json中的Debug

【图解IO与Netty系列】Netty编解码器、TCP粘包拆包问题处理、Netty心跳检测机制

Netty编解码器、TCP粘包拆包问题处理、Netty心跳检测机制 Netty编解码器编码器解码器编解码器Netty提供的现成编解码器 TCP粘包拆包问题处理Netty心跳检测机制 Netty编解码器 网络传输是以字节流的形式传输的,而我们的应用程序一般不会直接对字节流进行处理&#x…

金蝶BI方案与奥威BI:智能、高效的数据分析组合

在当今数据驱动的时代,企业对于快速、准确、全面的数据分析需求日益增长。金蝶BI方案和奥威BI SaaS平台正是为满足这一需求而精心打造的智能数据分析工具。 方案见效快 金蝶BI方案以其高效的数据处理能力,能够快速地将海量数据转化为有价值的信息。通过…

跟《经济学人》学英文:2024年6月15日这期 America

America seems immune to the world economy’s problems 美国似乎对世界经济问题免疫 immune to:美 [ɪˈmjun tu] 对…有免疫力;不受…感染;不受…的影响;免疫耐受; Elsewhere, political dysfunction and fiscal…

api-ms-win-crt-runtime-l1-1-0.dll文件丢失的情况要怎么处理?比较靠谱的多种修复方法分享

遇到api-ms-win-crt-runtime-l1-1-0.dll文件丢失的情况实际上是一个常见问题,解决此类问题存在多种方法。首先我们先来了解一下api-ms-win-crt-runtime-l1-1-0.dll文件吧,只有了解了我们才知道怎么去解决这个api-ms-win-crt-runtime-l1-1-0.dll文件丢失的…

【机器学习】计算机图形和深度学习模型NeRF详解(2)

1. 引言 本文是"计算机图形和深度学习模型NeRF详解"系列文章的续篇,进一步深入探讨了NeRF的核心技术。NeRF作为一项突破性技术,因其能够从有限的2D图像中重建出完整的3D场景,而在多个领域,如医学成像、3D场景重建、动画…

Spring中网络请求客户端WebClient的使用详解

Spring中网络请求客户端WebClient的使用详解_java_脚本之家 Spring5的WebClient使用详解-腾讯云开发者社区-腾讯云 在 Spring 5 之前,如果我们想要调用其他系统提供的 HTTP 服务,通常可以使用 Spring 提供的 RestTemplate 来访问,不过由于 …

国际荐酒师携手各国际荐酒师专业委员会深化2024年度合作

国际荐酒师(香港)协会携手广东海上丝绸之路文化促进会及广东省城镇化发展研究会,深化2024年度合作,共同打造品荐与传播大师班培养荐酒师专业人材 近日,国际荐酒师(香港)协会、广东海上丝绸之路…

通过防抖动代码解决ResizeObserver loop completed with undelivered notifications.

通过防抖动代码解决ResizeObserver loop completed with undelivered notifications. 一、报错内容二、解决方案解释: 一、报错内容 我通过el-tabs下的el-tab-pane切换到el-table出现的报错,大致是渲染宽度出现了问题 二、解决方案 扩展原生的 Resiz…

操作系统 文件系统

实验目的: 掌握文件系统设计的基本思想。理解掌握文件系统基本数据结构的设计。理解掌握文件操作中涉及的数据结构访问过程。 实验内容: 1、编程实现一个简单的内存文件系统。实现Linux常见的一些文件操作命令。比如:ls/cat/cp/rm等。 实…

足底筋膜炎怎么治疗效果好得快

足底筋膜炎症状:疼痛是足底筋膜炎最典型和常见的症状。患者通常会感到足跟或足底区域的疼痛,这种疼痛可能表现为刺痛、钝痛或灼热感。疼痛的程度和频率因人而异,但通常会在早晨起床后或长时间休息后首次站立时最为明显。这是因为休息时足底筋…

华为云与AWS负载均衡服务深度对比:性能、成本与可用性

随着云计算的迅速发展,企业对于云服务提供商的选择变得越来越关键。在选择云服务提供商时,负载均衡服务是企业关注的重点之一。我们九河云将深入比较两大知名云服务提供商华为云和AWS的负载均衡服务,从性能、成本和可用性等方面进行对比。 AW…

使用CAPL创建系统变量之sysDefineNamespace

目录 0 前言 1 使用CAPL创建系统变量 0 前言 最近在项目中发现可以通过CAPL来创建系统变量,这样方法在一定程度上提高了代码的统一性和测试的便利性。想要加入HIL自动化测试群的小伙伴欢迎评论区留言或私信,让我们一起进步! 1 使用CAPL创建…

Java23种设计模式(一)

前言 这2个月来,重新出发,从java开发需要的数据库、查询日志工具、开发工具等的安装、环境配置,再到后面的基础学习、数据库学习、扩展学习(maven、mq、设计模式、spring 系列等等),边学边记录&#xff0c…

typeScript debug 调试

以leetcode 20为例 0.首先编写代码 function isValid(s: string): boolean {let stack: string[] []for (let index 0; index < s.length; index) {let x: string s[index]debuggerswitch (x) {case (:stack.push())breakcase [:stack.push(])breakcase {:stack.push(})…