基于AMDGPU-ROCm的深度学习环境搭建

news2024/10/3 4:33:58

在风起云涌的AI江湖,NVIDIA凭借其CUDA生态和优秀的硬件大杀四方,立下赫赫战功,而另一家公司AMD也不甘示弱,带着他的生态解决方案ROCm开始了与不世出的NVIDA的正面硬钢,"ROCm is the answer to CUDA", AMD官网如是说。ROCm全称是Radeon Open Compute,从功能上,它是AMD提供的一套用于支持异构计算和GPU加速计算的开发工具和平台。

根据ROCm的官方介绍,这套框架不但支持AMD专业的计算卡,也支持AMD消费级的电脑显卡,之前有搭建过N卡CUDA的学习平台,这里尝试基于AMDGPU,搭建一个ROCm的学习平台。

平台信息

基于Ubuntu 20.04.6 LTS x86_64,显卡为AMD Ryzen 5 5600G with Radeon Graphics集显,应该是VEGA系列,支持VULKAN,OPENCL,当然,不支持CUDA。

搭建步骤

执行如下命令序列,添加用户组和安装ROCm一步到位:

sudo apt update && sudo apt dist-upgrade
sudo apt-get install wget gnupg2 
sudo usermod -a -G video $LOGNAME
sudo usermod -a -G render $LOGNAME
echo 'ADD_EXTRA_GROUPS=1' | sudo tee -a /etc/adduser.conf
echo 'EXTRA_GROUPS=video' | sudo tee -a /etc/adduser.conf
echo 'EXTRA_GROUPS=render' | sudo tee -a /etc/adduser.conf
sudo wget https://repo.radeon.com/amdgpu-install/22.10/ubuntu/focal/amdgpu-install_22.10.50100-1_all.deb
sudo apt-get install ./amdgpu-install_22.10.50100-1_all.deb 
sudo amdgpu-install --usecase=dkms
amdgpu-install -y --usecase=rocm
echo 'export PATH=$PATH:/opt/rocm/bin:/opt/rocm/profiler/bin:/opt/rocm/opencl/bin' | sudo tee -a /etc/profile.d/rocm.sh

安装结束后,ROCm开发SDK将会出现在/opt目录下, 包括LLVM编译器, opencl, profile, tools等开发工具,便于用户开发自己的基于AMDGPU加速的应用程序。总的来说, ROCm SDK提供了完整的栈支持,包括运行时、编译器、调试器、性能分析工具等,以满足不同开发和优化需求。

AMDGPU编译器目录

测试环境

执行rocm-smi获取显卡设备信息,执行时发现在获取SCLK和功耗两个参数时报错,可能是SDK和显卡兼容方面的问题,不过没有影响后面的简单测试,所以也就没有理会。

AMD一直提倡一种混合异构架构(HSA),这种架构下CPU和GPU乃至各种异构算力被一视同仁,被纳入统一的编程模型开发,从rocminfo工具的输出可以看到这一点,每一个异构算力被认为是一个agent:

$ sudo /opt/rocm/bin/rocminfo

我的计算平台有两个AGENT,分别是CPU和GPU,CPU是通用算例,有12个compute unit,对应的是6核12线程SMT。

GPU Agent信息,计算单元数量,计算最大的TENSOR维数信息,QUEUE数量,WAVE(Warpper)大小等信息:

opencl支持信息

ROCm不直接支持CUDA,但是 support another GPU programming mode opencl,通过clinfo查看:

/opt/rocm/opencl/bin/clinfo

docker容器运行测试

下载pytorch环境的docker:

sudo docker run -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device=/dev/dri --group-add video --name pytorch rocm/pytorch:latest

之后,在docker终端中执行如下命令,验证对CUDA的支持(猜测是利用CUDA的生态,底层是CUDA转OpenCL的算子实现):

>>> import torch
>>> torch.cuda.is_available()

输出为TRUE,说明AMDGPU的硬件加速平台搭建成功了。

简单分析

细心的同学可能注意到,在启动docker的测试命令中,传入了一个设备参数--device=/dev/kfd给到DOCKER环境,这个非常重要,kfd本身就是代表AMDGPU异构计算的GPU设备驱动(KMD)的设备节点,它是用户操作GPU的基础。

架构上,AMDKFD驱动程序是Radeon GPU计算软件栈的Linux内核代码,也是该公司ROCm产品的一部分,从 功能上可以理解为在 DRM 子系统中提供了 CPU 与 GPU 沟通的快速通道,使得两者可以平等的访问内存资源而无需额外拷贝。

我们可以在运行上面的测试时,简单追踪以下对KFD内核驱动的调用,以验证加速环境确实使用的AMDGPU显卡,KFD是一个字符设备,所以可以追踪其中的几个FOPS调用,不出所料,在docker中执行如上命令后,内核中则追踪到了如下对KFD的调用栈:

这个例子说明了ROCm穿上了CUDA的外衣,摇身一变实现了对TF框架的支持,但是核心引擎仍然用的是AMDGPU自己的加速实现,包括AMD的编译器,AMD的底层KMD KFD,数学加速库等等实现。

ROCm和CUDA编程模型对比

1.NVIDIA和AMD都支持OpenCL,毕竟CL是个开放标准。

2.OpenACC对标OpenMP,网上有很多例子.

3.HIP对标Cuda,Cuda Source可以翻译为 HIP,然后由ROCm编译为AMDGPU上可运行的代码。

大概意思针对异构计算,深度系学习场景,双方各自都有底牌,目前OpenCL打个平手,HIP和CUDA相比较生态弱一些,OpenACC和OpenMP旗鼓相当。

AMDGPU对CUDA支持方式的分析

以下内容来源于网上的资料和自己不负责任的分析,基于一个原理,计算机中的任何问题都可以通过增加一个中间层来解决:

实现方式是针对hip api(Heterogeneous-Computing Interface for Portability)做CUDA的接口套壳,即将cuda的API接口作为标准接口,用AMD ROCm实现cuda的API(目的就是保证对外的API与CUDA完全相同),但实际调用 HIP+ROCm 的相关接口实现(即实际在A卡上运行),编译生成动态库libcuda*.so,并替换cuda相关动态库,从而完成适配,达到兼容CUDA生态的接口的目的。ROCm也提供了HIPIFY工具,用于将CUDA源代码转换为HIP源代码,实现CUDA代码到HIP的移植。

ROCm API libraries — ROCm Documentation

Hipify工具

HIP是 CUDA API 的”山寨克隆“版。除了一些不常用的功能(e.g. managed memory)外,几乎全盘拷贝 CUDA API,是 CUDA 的一个子集。HIP让开发人员能够使用HIPIFY将CUDA应用程序移植到ROCm,HIPIFY会自动转换CUDA应用程序成为HIP内核语言和运行时API,使用NVIDIA的CUDA编译器或AMDCLANG编译为目标GPU的运行代码。

基于AMDGPU如何运行OpenCL测试用例

参考网络上基于CPU算例的OpenCL的例子,写一个cl的helloworld demo,这个并不太难,因为OpenCL是开源机构Khronos Group定义的标准,在这个标准下所有的头文件,运行时标准等等都是定义好的,所以源码级没有太多改动,基本上拷贝过来就能跑。重点是分析其基于AMDGPU的执行机制。

下面是一份简单的opencl的代码,基本上和C没有什么差别,除了需要提供device端的代码,没有也没有关系,简单的测试用例可以只在主机上跑。

#include <stdio.h>
#include <stdlib.h>
#include <alloca.h>
#include <CL/cl.h>

void displayPlatformInfo(cl_platform_id id,
                         cl_platform_info param_name,
                         const char* paramNameAsStr)
{
    cl_int error = 0;
    size_t paramSize = 0;
    error = clGetPlatformInfo( id, param_name, 0, NULL, &paramSize );
    char* moreInfo = (char*)alloca( sizeof(char) * paramSize);
    error = clGetPlatformInfo( id, param_name, paramSize, moreInfo, NULL );
    if (error != CL_SUCCESS ) {
        perror("Unable to find any OpenCL platform information");
        return;
    }
    printf("%s: %s\n", paramNameAsStr, moreInfo);
}

int main(void) {

   /* OpenCL 1.1 data structures */
   cl_platform_id* platforms;

   /* OpenCL 1.1 scalar data types */
   cl_uint numOfPlatforms;
   cl_int  error;

   /* 
      Get the number of platforms 
      Remember that for each vendor's SDK installed on the computer,
      the number of available platform also increased. 
    */
   error = clGetPlatformIDs(0, NULL, &numOfPlatforms);
   if(error != CL_SUCCESS) {			
      perror("Unable to find any OpenCL platforms");
      exit(1);
   }

   // Allocate memory for the number of installed platforms.
   // alloca(...) occupies some stack space but is automatically freed on return
   platforms = (cl_platform_id*) alloca(sizeof(cl_platform_id) * numOfPlatforms);
   printf("Number of OpenCL platforms found: %d\n", numOfPlatforms);

   error = clGetPlatformIDs(numOfPlatforms, platforms, NULL);
   if(error != CL_SUCCESS) {			
      perror("Unable to find any OpenCL platforms");
      exit(1);
   }
   // We invoke the API 'clPlatformInfo' twice for each parameter we're trying to extract
   // and we use the return value to create temporary data structures (on the stack) to store
   // the returned information on the second invocation.
   for(cl_uint i = 0; i < numOfPlatforms; ++i) {
        displayPlatformInfo( platforms[i], CL_PLATFORM_PROFILE, "CL_PLATFORM_PROFILE" );
        displayPlatformInfo( platforms[i], CL_PLATFORM_VERSION, "CL_PLATFORM_VERSION" );
        displayPlatformInfo( platforms[i], CL_PLATFORM_NAME,    "CL_PLATFORM_NAME" );
        displayPlatformInfo( platforms[i], CL_PLATFORM_VENDOR,  "CL_PLATFORM_VENDOR" );
        displayPlatformInfo( platforms[i], CL_PLATFORM_EXTENSIONS, "CL_PLATFORM_EXTENSIONS" );
   }

   return 0;
}

编译

ROCm环境安装了OpenCL开发所需要的编译器,OpenCL运行时环境,以及标准的Khronos Group头文件,使用如下命令编译:

/opt/rocm/llvm/bin/clang opencl.c -I/opt/rocm/opencl/include -L/opt/rocm/opencl/lib -lOpenCL
编译,运行测试没有问题

strace追踪系统调用,发现opencl的测试用例确实打开了/dev/kfd设备节点,并对GPU进行IOCTL操作:

strace -tt -T -f -e trace=file,close,openat,ioctl -o strace.log ./a.out

这个用例比较简单,调用的IOCTL 列表如下:

开发OpenCL Kernel测试用例

前面的例子只有主机侧的代码,没有GPU运行的代码,实际上没有调用AMDGPU的异构计算能力,参考网上的代码,写一个实现两个一维向量加和的kernel,投到AMDGPU上得到计算结果:

#include <stdio.h>
#include <stdlib.h>
#include <alloca.h>
#include <CL/cl.h>
#pragma warning( disable : 4996 )
int main() {
	cl_int error;
	cl_platform_id platforms;
 
	cl_device_id devices;
 
	cl_context context;
 
	FILE *program_handle;
	size_t program_size;
	char *program_buffer;
	cl_program program;
 
	size_t log_size;
	char *program_log;
 
	char kernel_name[] = "createBuffer";
	cl_kernel kernel;
 
	cl_command_queue queue;
	//获取平台
	error = clGetPlatformIDs(1, &platforms, NULL);
	if (error != 0) {
		printf("Get platform failed!");
		return -1;
	}
	//获取设备
	error = clGetDeviceIDs(platforms, CL_DEVICE_TYPE_GPU, 1, &devices, NULL);
	if (error != 0) {
		printf("Get device failed!");
		return -1;
	}
	//创建上下文
	context = clCreateContext(NULL,1,&devices,NULL,NULL,&error);
	if (error != 0) {
		printf("Creat context failed!");
		return -1;
	}
	//创建程序;注意要用"rb"
	program_handle = fopen("kernel.cl","rb");
	if (program_handle == NULL) {
		printf("The kernle can not be opened!");
		return -1;
	}
	fseek(program_handle,0,SEEK_END);
	program_size = ftell(program_handle);
	rewind(program_handle);
 
	program_buffer = (char *)malloc(program_size+1);
	program_buffer[program_size] = '\0';
	error=fread(program_buffer,sizeof(char),program_size,program_handle);
	if (error == 0) {
		printf("Read kernel failed!");
		return -1;
	}
	fclose(program_handle);
	program = clCreateProgramWithSource(context,1,(const char **)&program_buffer, 
	                                     &program_size,&error);
	if (error < 0) {
		printf("Couldn't create the program!");
		return -1;
	}
	//编译程序
	error = clBuildProgram(program,1,&devices,NULL,NULL,NULL);
	if (error < 0) {
		//确定日志文件的大小
		clGetProgramBuildInfo(program,devices,CL_PROGRAM_BUILD_LOG,0,NULL,&log_size);
		program_log = (char *)malloc(log_size+1);
		program_log[log_size] = '\0';
		//读取日志
		clGetProgramBuildInfo(program, devices, CL_PROGRAM_BUILD_LOG, 
		                       log_size+1, program_log, NULL);
		printf("%s\n",program_log);
		free(program_log);
		return -1;
	}
	free(program_buffer);
	//创建命令队列
	queue = clCreateCommandQueue(context, devices, CL_QUEUE_PROFILING_ENABLE, &error);
	if (error < 0) {
		printf("Coudn't create the command queue");
		return -1;
	}
	//创建内核
	kernel = clCreateKernel(program,kernel_name,&error);
	if (kernel==NULL) {
		printf("Couldn't create kernel!\n");
		return -1;
	}
	//初始化参数
	float result[100];
	float a_in[100];
	float b_in[100];
	for (int i = 0; i < 100; i++) {
		a_in[i] = i;
		b_in[i] = i*2.0;
	}
	//创建缓存对象
	cl_mem memObject1 = clCreateBuffer(context,CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR,sizeof(float)*100,a_in,&error);
	if (error < 0) {
		printf("Creat memObject1 failed!\n");
		return -1;
	}
	cl_mem memObject2 = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, 
	                                    sizeof(float) * 100, b_in, &error);
	if (error < 0) {
		printf("Creat memObject2 failed!\n");
		return -1;
	}
	cl_mem memObject3 = clCreateBuffer(context, CL_MEM_WRITE_ONLY , 
	                                       sizeof(float) * 100, NULL, &error);
	if (error < 0) {
		printf("Creat memObject3 failed!\n");
		return -1;
	}
	//设置内核参数
	error = clSetKernelArg(kernel,0,sizeof(cl_mem),&memObject1);
	error|= clSetKernelArg(kernel, 1, sizeof(cl_mem), &memObject2);
	error |= clSetKernelArg(kernel, 2, sizeof(cl_mem), &memObject3);
	if (error != CL_SUCCESS) {
		printf("Error setting kernel arguments!\n");
		return -1;
	}
	//执行内核
	size_t globalWorkSize[1] = {100};
	size_t localWorkSize[1] = {1};
	error = clEnqueueNDRangeKernel(queue,kernel,1,NULL,globalWorkSize, 
	                                localWorkSize,0,NULL,NULL);
	if (error != CL_SUCCESS) {
		printf("Error queuing kernel for execution!\n");
		return -1;
	}
	//读取执行结果
	error = clEnqueueReadBuffer(queue,memObject3,CL_TRUE,0,100*sizeof(float), 
	                             result,0,NULL,NULL);
	if (error != CL_SUCCESS) {
		printf("Error reading result buffer!\n");
		return -1;
	}
	//显示结果
	for (int i = 0; i < 100; i++) {
		printf("%f ",result[i]);
	}
	printf("\n");
	//释放资源
	clReleaseDevice(devices);
	clReleaseContext(context);
	clReleaseCommandQueue(queue);
	clReleaseProgram(program);
	clReleaseKernel(kernel);
	clReleaseMemObject(memObject1);
	clReleaseMemObject(memObject2);
	clReleaseMemObject(memObject3);
	return 0;
}

设备端代码:

__kernel void createBuffer(__global const float *a_in,
	__global const float *b_in,
	__global float *result) {
int gid = get_global_id(0);
result[gid] = a_in[gid] + b_in[gid];
}

编译命令不变,kernel.cl会被主文件读入,然后被ROCm动态编译为GPU端指令,通过ROCm runtime加载道GPU端运行,得到计算结果,计算结果符合预期:

作为驱动开发者,实际上最关心的是KFD端的调用序列,通过追踪可以看到,此时由于加入了设备端计算的功能,KFD的IOCTL调用序列明显比前面长了好多,其中包括了COMMAND QUEUE创建的IOCTL也被调用到,因为设备端代码要通过COMMAND QUEUE传递给AMDGPU去执行。

完整的KFD调用序列记录如下,方便以后分析:

5684  00:08:54.545211 ioctl(5, AMDKFD_IOC_GET_VERSION, 0x7ffe0edc1e00) = 0 <0.000005>
5684  00:08:54.549152 ioctl(5, AMDKFD_IOC_GET_PROCESS_APERTURES_NEW, 0x7ffe0edc1ab0) = 0 <0.000005>
5684  00:08:54.549169 ioctl(5, AMDKFD_IOC_ACQUIRE_VM, 0x7ffe0edc1ab0) = 0 <0.000033>
5684  00:08:54.549262 ioctl(5, AMDKFD_IOC_SET_MEMORY_POLICY, 0x7ffe0edc1ab0) = 0 <0.000004>
5684  00:08:54.549301 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7ffe0edc1970) = 0 <0.000007>
5684  00:08:54.549333 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7ffe0edc1990) = 0 <0.000091>
5684  00:08:54.581556 ioctl(5, AMDKFD_IOC_GET_CLOCK_COUNTERS, 0x7ffe0edc1a40) = 0 <0.000006>
5684  00:08:54.581583 ioctl(5, AMDKFD_IOC_GET_CLOCK_COUNTERS, 0x7ffe0edc1e30) = 0 <0.000004>
5684  00:08:54.581626 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7ffe0edc1b90) = 0 <0.000012>
5684  00:08:54.581677 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7ffe0edc1c60) = 0 <0.000239>
5684  00:08:54.581933 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1db0) = 0 <0.000014>
5684  00:08:54.582012 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7ffe0edc1ad0) = 0 <0.000022>
5684  00:08:54.582043 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7ffe0edc1b00) = 0 <0.000029>
5684  00:08:54.582084 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1ca0) = 0 <0.000004>
5684  00:08:54.582259 ioctl(5, AMDKFD_IOC_SET_SCRATCH_BACKING_VA, 0x7ffe0edc1d40) = 0 <0.000002>
5684  00:08:54.582322 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7ffe0edc1750) = 0 <0.000010>
5684  00:08:54.582342 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7ffe0edc1780) = 0 <0.000031>
5684  00:08:54.582388 ioctl(5, AMDKFD_IOC_SET_TRAP_HANDLER, 0x7ffe0edc1e10) = 0 <0.000006>
5687  00:08:54.582511 ioctl(5, AMDKFD_IOC_WAIT_EVENTS <unfinished ...>
5684  00:08:54.587645 ioctl(5, AMDKFD_IOC_GET_TILE_CONFIG, 0x7ffe0edc1310) = 0 <0.000005>
5684  00:08:54.587777 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7ffe0edc19d0) = 0 <0.000153>
5684  00:08:54.587945 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7ffe0edc19e0) = 0 <0.000246>
5684  00:08:54.588269 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7ffe0edc19d0) = 0 <0.000146>
5684  00:08:54.588429 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7ffe0edc19e0) = 0 <0.000035>
5684  00:08:54.588477 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1be0) = 0 <0.000007>
5689  00:08:54.650506 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef744e0) = 0 <0.000028>
5689  00:08:54.650547 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef74510) = 0 <0.000222>
5689  00:08:54.650832 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef74490) = 0 <0.000009>
5689  00:08:54.650851 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef744c0) = 0 <0.000035>
5689  00:08:54.650897 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74700) = 0 <0.000006>
5689  00:08:54.650964 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef74240) = 0 <0.000008>
5689  00:08:54.650982 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef742b0) = 0 <0.000034>
5689  00:08:54.651053 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef74200) = 0 <0.000008>
5689  00:08:54.651090 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef742b0) = 0 <0.000036>
5689  00:08:54.651186 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef74240) = 0 <0.000376>
5689  00:08:54.651573 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef742b0) = 0 <0.000114>
5689  00:08:54.651697 ioctl(5, AMDKFD_IOC_CREATE_QUEUE, 0x7fae7ef74420) = 0 <0.000177>
5689  00:08:54.651914 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef742c0) = 0 <0.000008>
5689  00:08:54.651945 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef74330) = 0 <0.000025>
5689  00:08:54.651981 ioctl(5, AMDKFD_IOC_SET_EVENT, 0x7fae7ef74710) = 0 <0.000007>
5689  00:08:54.652000 ioctl(5, AMDKFD_IOC_SET_EVENT, 0x7fae7ef74710) = 0 <0.000006>
5689  00:08:54.652067 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef74510) = 0 <0.000010>
5689  00:08:54.652086 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef74540) = 0 <0.000024>
5689  00:08:54.652175 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef745f0) = 0 <0.000009>
5689  00:08:54.652196 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef74620) = 0 <0.000028>
5687  00:08:54.652231 ioctl(5, AMDKFD_IOC_WAIT_EVENTS, 0x7fae7f7e8c20) = 0 <0.000008>
5687  00:08:54.652251 ioctl(5, AMDKFD_IOC_WAIT_EVENTS <unfinished ...>
5689  00:08:54.652295 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef745a0) = 0 <0.000048>
5689  00:08:54.652353 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef745d0) = 0 <0.000029>
5689  00:08:54.652446 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef74350) = 0 <0.000010>
5689  00:08:54.652467 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef743c0) = 0 <0.000024>
5689  00:08:54.652531 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef74310) = 0 <0.000008>
5689  00:08:54.652565 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef743c0) = 0 <0.000030>
5689  00:08:54.652654 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef74350) = 0 <0.000390>
5689  00:08:54.653057 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef743c0) = 0 <0.000119>
5689  00:08:54.653190 ioctl(5, AMDKFD_IOC_CREATE_QUEUE, 0x7fae7ef74530) = 0 <0.000068>
5689  00:08:54.653271 ioctl(5, AMDKFD_IOC_SET_EVENT, 0x7fae7ef74820) = 0 <0.000007>
5689  00:08:54.653290 ioctl(5, AMDKFD_IOC_SET_EVENT <unfinished ...>
5689  00:08:54.653366 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef74620) = 0 <0.000010>
5689  00:08:54.653386 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef74650) = 0 <0.000025>
5689  00:08:54.653480 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU <unfinished ...>
5687  00:08:54.653532 ioctl(5, AMDKFD_IOC_WAIT_EVENTS, 0x7fae7f7e8c20) = 0 <0.000005>
5687  00:08:54.653551 ioctl(5, AMDKFD_IOC_WAIT_EVENTS <unfinished ...>
5689  00:08:54.653590 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef74a30) = 0 <0.000025>
5689  00:08:54.836246 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef73c80) = 0 <0.000022>
5689  00:08:54.836303 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef73cf0) = 0 <0.000049>
5689  00:08:54.836417 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef73da0) = 0 <0.000019>
5689  00:08:54.836447 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef73dd0) = 0 <0.000031>
5689  00:08:54.836509 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74690) = 0 <0.000007>
5689  00:08:54.836582 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef74590) = 0 <0.000010>
5689  00:08:54.836603 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef745c0) = 0 <0.000029>
5689  00:08:54.836696 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef74140) = 0 <0.000007>
5689  00:08:54.836714 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef74170) = 0 <0.000023>
5689  00:08:54.836800 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef74140) = 0 <0.000007>
5689  00:08:54.836818 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef74170) = 0 <0.000027>
5689  00:08:54.837235 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef74140) = 0 <0.000018>
5689  00:08:54.837266 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef74170) = 0 <0.000085>
5689  00:08:54.837497 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef74840) = 0 <0.000009>
5689  00:08:54.837519 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef74840) = 0 <0.000025>
5689  00:08:54.837558 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000007>
5689  00:08:54.837577 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000005>
5689  00:08:54.837599 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000007>
5689  00:08:54.837618 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000004>
5689  00:08:54.837632 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000004>
5689  00:08:54.837647 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000004>
5689  00:08:54.837662 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000004>
5689  00:08:54.837676 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000004>
5689  00:08:54.837691 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000004>
5689  00:08:54.837705 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000008>
5689  00:08:54.837723 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000004>
5689  00:08:54.837738 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000004>
5689  00:08:54.837752 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000004>
5689  00:08:54.837766 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000004>
5689  00:08:54.837835 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef748b0) = 0 <0.000011>
5689  00:08:54.837857 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef748e0) = 0 <0.000080>
5689  00:08:54.837951 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000006>
5689  00:08:54.837971 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000009>
5689  00:08:54.837999 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000009>
5689  00:08:54.838024 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000006>
5689  00:08:54.838046 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000006>
5689  00:08:54.838067 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000006>
5689  00:08:54.838088 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000006>
5689  00:08:54.838109 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000006>
5689  00:08:54.838130 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000006>
5689  00:08:54.838151 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000006>
5689  00:08:54.838172 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000006>
5689  00:08:54.838193 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000006>
5689  00:08:54.838214 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000006>
5689  00:08:54.838236 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000006>
5689  00:08:54.838257 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000006>
5689  00:08:54.838278 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000006>
5689  00:08:54.838299 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000006>
5689  00:08:54.838321 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74b50) = 0 <0.000006>
5684  00:08:54.838720 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7ffe0edc14a0) = 0 <0.000020>
5684  00:08:54.838757 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7ffe0edc14d0) = 0 <0.000032>
5684  00:08:54.838963 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7ffe0edc1840) = 0 <0.000014>
5684  00:08:54.838994 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7ffe0edc1870) = 0 <0.000032>
5684  00:08:54.839099 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7ffe0edc17f0) = 0 <0.000044>
5684  00:08:54.839154 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7ffe0edc1820) = 0 <0.000024>
5684  00:08:54.839247 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7ffe0edc15a0) = 0 <0.000008>
5684  00:08:54.839266 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7ffe0edc1610) = 0 <0.000027>
5684  00:08:54.839331 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7ffe0edc1560) = 0 <0.000007>
5684  00:08:54.839366 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7ffe0edc1610) = 0 <0.000032>
5684  00:08:54.839457 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7ffe0edc15a0) = 0 <0.000281>
5684  00:08:54.839754 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7ffe0edc1610) = 0 <0.000104>
5684  00:08:54.839873 ioctl(5, AMDKFD_IOC_CREATE_QUEUE, 0x7ffe0edc1780) = 0 <0.000162>
5684  00:08:54.840049 ioctl(5, AMDKFD_IOC_SET_EVENT, 0x7ffe0edc1a70) = 0 <0.000006>
5684  00:08:54.840071 ioctl(5, AMDKFD_IOC_SET_EVENT, 0x7ffe0edc1a70) = 0 <0.000003>
5684  00:08:54.840148 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7ffe0edc1870) = 0 <0.000010>
5684  00:08:54.840169 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7ffe0edc18a0) = 0 <0.000032>
5684  00:08:54.840267 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU <unfinished ...>
5687  00:08:54.840306 ioctl(5, AMDKFD_IOC_WAIT_EVENTS, 0x7fae7f7e8c20) = 0 <0.000007>
5687  00:08:54.840324 ioctl(5, AMDKFD_IOC_WAIT_EVENTS <unfinished ...>
5684  00:08:54.840351 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7ffe0edc1c80) = 0 <0.000033>
5684  00:08:54.840433 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7ffe0edc1a90) = 0 <0.000007>
5684  00:08:54.840452 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7ffe0edc1a90) = 0 <0.000024>
5684  00:08:54.840489 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000005>
5684  00:08:54.840506 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000004>
5684  00:08:54.840520 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000004>
5684  00:08:54.840534 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000004>
5684  00:08:54.840548 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000004>
5684  00:08:54.840562 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000003>
5684  00:08:54.840576 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000003>
5684  00:08:54.840589 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000003>
5684  00:08:54.840603 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000003>
5684  00:08:54.840617 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000003>
5684  00:08:54.840630 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000004>
5684  00:08:54.840644 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000004>
5684  00:08:54.840658 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000004>
5684  00:08:54.840672 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000004>
5684  00:08:54.840686 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000004>
5684  00:08:54.840700 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000003>
5684  00:08:54.840714 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000004>
5684  00:08:54.840728 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000003>
5684  00:08:54.840741 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000003>
5684  00:08:54.840755 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000003>
5684  00:08:54.840769 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000003>
5684  00:08:54.840782 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000003>
5684  00:08:54.840796 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000003>
5684  00:08:54.840810 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000003>
5684  00:08:54.840824 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000003>
5684  00:08:54.840837 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000003>
5684  00:08:54.840854 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000004>
5684  00:08:54.840869 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000003>
5684  00:08:54.840883 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000009>
5684  00:08:54.840902 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000004>
5684  00:08:54.840916 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000003>
5684  00:08:54.840930 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1da0) = 0 <0.000003>
5684  00:08:54.840998 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7ffe0edc19b0) = 0 <0.000130>
5684  00:08:54.841144 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7ffe0edc19e0) = 0 <0.000119>
5684  00:08:54.841363 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7ffe0edc16e0) = 0 <0.000008>
5684  00:08:54.841382 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7ffe0edc1750) = 0 <0.000023>
5684  00:08:54.841416 ioctl(5, AMDKFD_IOC_CREATE_QUEUE, 0x7ffe0edc18c0) = 0 <0.000025>
5684  00:08:54.841452 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1b60) = 0 <0.000004>
5684  00:08:54.841466 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7ffe0edc1b60) = 0 <0.000004>
5689  00:08:54.841551 ioctl(5, AMDKFD_IOC_GET_CLOCK_COUNTERS, 0x7fae7ef748a0) = 0 <0.000007>
5689  00:08:54.841637 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef745d0) = 0 <0.000150>
5689  00:08:54.841800 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef74600) = 0 <0.000090>
5689  00:08:54.841996 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef74300) = 0 <0.000010>
5689  00:08:54.842018 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef74370) = 0 <0.000024>
5689  00:08:54.842054 ioctl(5, AMDKFD_IOC_CREATE_QUEUE, 0x7fae7ef744e0) = 0 <0.000033>
5689  00:08:54.842099 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74780) = 0 <0.000005>
5689  00:08:54.842172 ioctl(5, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7fae7ef744e0) = 0 <0.000011>
5689  00:08:54.842194 ioctl(5, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7fae7ef74510) = 0 <0.000029>
5689  00:08:54.842235 ioctl(5, AMDKFD_IOC_CREATE_EVENT, 0x7fae7ef74780) = 0 <0.000004>
5689  00:08:54.842440 ioctl(5, AMDKFD_IOC_UNMAP_MEMORY_FROM_GPU, 0x7fae7ef74c30) = 0 <0.000011>
5689  00:08:54.842468 ioctl(5, AMDKFD_IOC_FREE_MEMORY_OF_GPU, 0x7fae7ef74c70) = 0 <0.000011>
5689  00:08:54.842517 ioctl(5, AMDKFD_IOC_UNMAP_MEMORY_FROM_GPU, 0x7fae7ef74bf0) = 0 <0.000006>
5689  00:08:54.842536 ioctl(5, AMDKFD_IOC_FREE_MEMORY_OF_GPU, 0x7fae7ef74c40) = 0 <0.000010>
5689  00:08:54.842603 ioctl(5, AMDKFD_IOC_SET_EVENT, 0x7fae7ef74d10) = 0 <0.000006>
5687  00:08:54.842634 ioctl(5, AMDKFD_IOC_SET_EVENT, 0x7fae7f7e8da0) = 0 <0.000007>
5689  00:08:54.842654 ioctl(5, AMDKFD_IOC_DESTROY_QUEUE <unfinished ...>
5687  00:08:54.842660 ioctl(5, AMDKFD_IOC_SET_EVENT, 0x7fae7f7e8df0) = 0 <0.000007>
5689  00:08:54.842752 ioctl(5, AMDKFD_IOC_UNMAP_MEMORY_FROM_GPU, 0x7fae7ef74c30) = 0 <0.000007>
5689  00:08:54.842770 ioctl(5, AMDKFD_IOC_FREE_MEMORY_OF_GPU, 0x7fae7ef74c80) = 0 <0.000006>
5689  00:08:54.842841 ioctl(5, AMDKFD_IOC_UNMAP_MEMORY_FROM_GPU <unfinished ...>
5687  00:08:54.842879 ioctl(5, AMDKFD_IOC_WAIT_EVENTS <unfinished ...>
5689  00:08:54.842899 ioctl(5, AMDKFD_IOC_FREE_MEMORY_OF_GPU <unfinished ...>
5687  00:08:54.842906 ioctl(5, AMDKFD_IOC_WAIT_EVENTS <unfinished ...>
5689  00:08:54.843095 ioctl(5, AMDKFD_IOC_UNMAP_MEMORY_FROM_GPU, 0x7fae7ef74c50) = 0 <0.000006>
5689  00:08:54.843114 ioctl(5, AMDKFD_IOC_FREE_MEMORY_OF_GPU, 0x7fae7ef74ca0) = 0 <0.000007>
5689  00:08:54.843151 ioctl(5, AMDKFD_IOC_UNMAP_MEMORY_FROM_GPU, 0x7fae7ef74ba0) = 0 <0.000006>
5689  00:08:54.843167 ioctl(5, AMDKFD_IOC_FREE_MEMORY_OF_GPU, 0x7fae7ef74bf0) = 0 <0.000007>
5689  00:08:54.843220 ioctl(5, AMDKFD_IOC_UNMAP_MEMORY_FROM_GPU, 0x7fae7ef74bc0) = 0 <0.000006>
5689  00:08:54.843239 ioctl(5, AMDKFD_IOC_FREE_MEMORY_OF_GPU, 0x7fae7ef74c10) = 0 <0.000006>
5689  00:08:54.843279 ioctl(5, AMDKFD_IOC_UNMAP_MEMORY_FROM_GPU, 0x7fae7ef74bc0) = 0 <0.000006>
5689  00:08:54.843295 ioctl(5, AMDKFD_IOC_FREE_MEMORY_OF_GPU, 0x7fae7ef74c10) = 0 <0.000007>
5684  00:08:54.843339 ioctl(5, AMDKFD_IOC_UNMAP_MEMORY_FROM_GPU, 0x7ffe0edc1f60) = 0 <0.000007>
5684  00:08:54.843355 ioctl(5, AMDKFD_IOC_FREE_MEMORY_OF_GPU, 0x7ffe0edc1fb0) = 0 <0.000008>
runtime arch

参考资料

百度安全验证

AMD的ROCM平台是什么? - 知乎

https://cgmb-rocm-docs.readthedocs.io/_/downloads/en/latest/pdf/

OpenCL编程详细解析与实例 - 知乎

第1章 简介异构计算 - OpenCL 2.0 异构计算 - 开发文档 - 文江博客

HIP-ROCM架构概述 - 知乎

ROCm兼容cuda方案和验证 - 知乎

ROCm API libraries — ROCm Documentation

源码编译安装ROCm以运行tensorflow-rocm(适用于Ubuntu 23.04) - 小蓝博客

AMD 推出 HIP SDK:拓展 ROCm 方案,为 CUDA 应用程序提供支持_腾讯新闻

https://dev.to/shawonashraf/setting-up-your-amd-gpu-for-tensorflow-in-ubuntu-20-04-31f5

GPU memory — ROCm Documentation

Index of /amdgpu-install/

https://www.cnblogs.com/lllzhuang/articles/16083003.html

结束

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1471748.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

MySQL锁三部曲:临键、间隙与记录的奇妙旅程

欢迎来到我的博客&#xff0c;代码的世界里&#xff0c;每一行都是一个故事 MySQL锁三部曲&#xff1a;临键、间隙与记录的奇妙旅程 前言临键锁的奥秘间隙锁记录锁 前言 在数据库世界中&#xff0c;锁是维护数据完整性的一种关键机制。而MySQL中的临键锁、间隙锁和记录锁则是锁…

2024.2.25 -ElasticSearch 进阶

倒排索引 Elasticsearch的倒排索引机制是通过将文档中出现的词汇与它们所在的文档ID关联起来&#xff0c;实现快速查找包含特定词汇的文档。下面是一个具体的例子来说明倒排索引的工作原理&#xff1a; 假设我们有一个简单的文章集合&#xff0c;包含以下三篇文章&#xff1a…

pytorch 用F.normalization的逆归一化如何操作

逆归一化的时候再把这个数乘回去就行了 magnitude a.norm(p2, dim1, keepdimTrue) # NEW atorch.nn.functional.normalize(a, p2, dim1) a_or a* magnitude # NEW print(a_or) Outputs: tensor([]1,2,3)

springboot网站开发02-接入持久层框架mybatisPlus

springboot网站开发02-接入持久层框架mybatisPlus&#xff01;经过上一小节内容分享&#xff0c;我们的项目嵌套模式框架搭建好了&#xff0c;下面就是开始编辑具体的业务代码了&#xff0c;我们使用到了持久层框架是mybatisPlus插件。下面是一些具体的植入框架的操作步骤。 第…

【Java程序员面试专栏 算法思维】二 高频面试算法题:二分查找

一轮的算法训练完成后,对相关的题目有了一个初步理解了,接下来进行专题训练,以下这些题目就是汇总的高频题目,本篇主要聊聊二分查找,包括基础二分,寻找目标值的左右边界,搜索旋转数组以及波峰,以及x的平方根问题,所以放到一篇Blog中集中练习 题目关键字解题思路时间空…

【MySQL面试复习】索引创建的原则有哪些?

系列文章目录 在MySQL中&#xff0c;如何定位慢查询&#xff1f; 发现了某个SQL语句执行很慢&#xff0c;如何进行分析&#xff1f; 了解过索引吗&#xff1f;(索引的底层原理)/B 树和B树的区别是什么&#xff1f; 什么是聚簇索引&#xff08;聚集索引&#xff09;和非聚簇索引…

MAC M1 安装mongodb7.0.5 版本

1、进入官网 Download MongoDB Community Server | MongoDBDownload MongoDB Community Server non-relational database to take your next big project to a higher level!https://www.mongodb.com/try/download/community 2、选择版本 3、下载后解压 放到 /usr/local 并修改…

好狄空气能热水器成功上榜2023年消费者心中的十大信赖品牌

随着环保意识的增强和能源消耗的持续关注&#xff0c;空气能热水器以其高效节能、绿色环保的特性赢得了越来越多消费者的青睐。市场上琳琅满目的空气能热水器品牌让消费者在选择时既兴奋又困惑。究竟哪些品牌能在激烈的竞争中脱颖而出&#xff0c;成为消费者心目中的佼佼者呢&a…

MongoDB之客户端工具与核心概念及基本类型篇

MongoDB之客户端工具与核心概念及基本类型篇 文章目录 MongoDB之客户端工具与核心概念及基本类型篇1. MongoDB是什么?1. 关于MongoDB2. 相关客户端工具1. MongoDB Compass2. Studio 3T3. Navicat for MongoDB4. NoSQL Manager for MongoDB Professional 2.MongoDB相关概念2.1 …

RabbitMQ实战学习

RabbitMQ实战学习 文章目录 RabbitMQ实战学习RabbitMQ常用资料1、安装教程2、使用安装包3、常用命令4、验证访问5、代码示例 一、RabbitMQ基本概念1.1. MQ概述1.2 MQ 的优势和劣势1.3 MQ 的优势1. 应用解耦2. 异步提速3. 削峰填谷 1.4 MQ 的劣势1.5 RabbitMQ 基础架构1.6 JMS 二…

钡铼lora智能网关终端节点温湿度无线采集4G远传

钡铼LoRa智能网关终端节点是一种用于温湿度无线采集和4G远传的设备&#xff0c;它能够实现远程监测和数据传输&#xff0c;适用于各种应用场景&#xff0c;包括工业、农业、环境监测等领域。在设置钡铼LoRa智能网关终端节点时&#xff0c;我们需要考虑到设备的功能特点、网络连…

C++ 入门(六)— 调试程序(Debugging)

文章目录 语法和语义错误调试(Debugging)调试过程调试策略常用的调试策略更多调试策略 使用集成调试器步进&#xff08;Stepping&#xff09; 语法和语义错误 语法错误 编写根据 C 语言的语法无效的语句时&#xff0c;会发生语法错误。例如缺少分号、使用未声明的变量、括号或…

字符函数和字符串函数(C语言进阶)(三)

目录 前言 接上篇&#xff1a; 1.7 strtok 1.8 strerror 1.9 字符分类函数 总结 前言 C语言中对字符和字符串的处理是很频繁的&#xff0c;但是c语言本身是没有字符串类型的&#xff0c;字符串通常放在常量字符串中或着字符数组中。 字符串常量适用于那些对它不做修改的字…

生成式 AI - Diffusion 模型的数学原理(5)

来自 论文《 Denoising Diffusion Probabilistic Model》&#xff08;DDPM&#xff09; 论文链接&#xff1a; https://arxiv.org/abs/2006.11239 Hung-yi Lee 课件整理 讲到这里还没有解决的问题是&#xff0c;为什么这里还要多加一个噪声。Denoise模型算出来的是高斯分布的均…

根据前序后序遍历求出二叉树

根据前序后序遍历求出二叉树 一、题目描述 给定两个整数数组&#xff0c;preorder 和 postorder &#xff0c;其中 preorder 是一个具有 无重复 值的二叉树的前序遍历&#xff0c;postorder 是同一棵树的后序遍历&#xff0c;重构并返回二叉树。 二、题目分析 需求&#xff…

成都直播基地作为产业重要载体,引领直播行业健康、多元发展

近年来&#xff0c;我国网络直播行业呈现出井喷式的发展态势。众多直播平台如雨后春笋般涌现&#xff0c;直播内容丰富多样&#xff0c;涵盖游戏、电竞、美食、旅游、教育等多个领域。同时&#xff0c;成都直播产业园规模持续扩大&#xff0c;产业不断完善&#xff0c;整体呈现…

如何在Win系统搭建Oracle数据库并实现远程访问【内网穿透】

文章目录 前言1. 数据库搭建2. 内网穿透2.1 安装cpolar内网穿透2.2 创建隧道映射 3. 公网远程访问4. 配置固定TCP端口地址4.1 保留一个固定的公网TCP端口地址4.2 配置固定公网TCP端口地址4.3 测试使用固定TCP端口地址远程Oracle 前言 Oracle&#xff0c;是甲骨文公司的一款关系…

基于Java SSM框架实现高考填报信息系统项目【项目源码】计算机毕业设计

基于java的SSM框架实现高考填报信息系统演示 JAVA简介 Java主要采用CORBA技术和安全模型&#xff0c;可以在互联网应用的数据保护。它还提供了对EJB&#xff08;Enterprise JavaBeans&#xff09;的全面支持&#xff0c;java servlet API&#xff0c;JSP&#xff08;java serv…

【前沿热点视觉算法】-带有信道坐标注意特征融合模块的双光谱语义分割网络

计算机视觉算法分享。问题或建议&#xff0c;请文章私信或者文章末尾扫码加微信留言。 1 论文题目 带有信道坐标注意特征融合模块的双光谱语义分割网络 2 论文摘要 双光谱&#xff08;RGB-hehtir&#xff09;语义分割是在恶劣成像环境&#xff08;如黑暗、雨、雾&#xff09…

SpringCloud Alibaba 2022之Nacos学习

SpringCloud Alibaba 2022使用 SpringCloud Alibaba 2022需要Spring Boot 3.0以上的版本&#xff0c;同时JDK需要是17及以上的版本。具体的可以看官网的说明。 Spring Cloud Alibaba版本说明 环境搭建 这里搭建的是一个聚合项目。项目结构如下&#xff1a; 父项目的pom.xm…