OpenCL平台
在2.1中讨论过,OpenCL应用的第一步是查询OpenCL平台集合,选择其中一个或多个平台在应用中使用。与平台关联有一个简档 (profile),描述所支持的特定OpenCL版本的功能。简档可以是完全简档,涵盖定义为核心规范的所有功能,或者是嵌入式简档,定义为完全简档的一个子集,其中删除了为保证与IEEE754标准一致而提出的一些需求。
平台集可以用以下命令查询:
extern CL_API_ENTRY cl_int CL_API_CALL
clGetPlatformIDs(cl_uint num_entries,
cl_platform_id * platforms,
cl_uint * num_platforms) CL_API_SUFFIX__VERSION_1_0;
这个命令会得到可用的OpenCL平台的列表。如果参数platforms
为 NULL
,clGetPlatformIDs
会返回可用平台数。返回的平台数可以用num_entries
来限制,这个参数要大于0并小于或等于可用平台数。
可以将num_entries
和 platforms
分别设置为0和NULL来查询可用的平台个数。对于Apple 的OpenCL实现,这一步没有必要,它不是将查询到的平台传递给其他API调用,如clGetDeviceIds()
,而是直接传递值NULL
。
可以举一个简单的例子说明如何查询和选择平台,下面使用clGetPlatformIDs()
得到一个平台ID列表:
cl_int errNum;
cl_uint numPlatforms;
cl_platform_id* platformIds;
cl_context context = NULL;
errNum = clGetPlatformIDs(0, NULL, &numPlatforms);
platformIds = (cl_platform_id*)alloca(sizeof(cl_platform_id) * numPlatforms);
errNum = clGetPlatformIDs(numPlatforms, platformIds, NULL);
给定一个平台,可以用以下命令查询各个属性:
extern CL_API_ENTRY cl_int CL_API_CALL
clGetPlatformInfo(cl_platform_id platform,
cl_platform_info param_name,
size_t param_value_size,
void * param_value,
size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0;
这个命令返回有关这个OpenCL平台的特定信息。param_name
的可取如下给出了各种合法的查询,可以将param_value_size
和param_value
的值分别设置为0
和NULL
来查询返回值的大小。
CL_PLATFORM_PROFILE char[] OpenCL简档字符串。简档可以是以下两个字符串之一:
FULL_PROFILE:OpenCL实现支持定义为核心规范的所有功能
EMBEDDED_PROFILE:OpenCL实现支持定义为核心规范的功能的一个子集
CL_PLATFORM_VERSION char[] OpenCL版本字符串
CL_PLATFORM_NAME char[] 平台名字符串
CL_PLATFORM_VENDOR char[] 平台开发商字符串
CL_PLATFORM_EXTENSIONS char[] 平台支持的扩展名列表
举一个简单的例子来说明如何查询和选择平台,下面使用clGetPlatformInfo()
来得到关联的平台名和平台开发商字符串:
cl_int errNum;
std::size_t paramValueSize;
errNum = clGetPlatformInfo(id, CL_PLATFORM_NAME, 0, NULL, ¶mValueSize);
char* name = (char*)alloca(sizeof(char) * paramValueSize);
errNum = clGetPlatformInfo(id, CL_PLATFORM_NAME, paramValueSize, info, NULL);
errNum = clGetPlatformInfo(id, CL_PLATFORM_VENDOR, 0, NULL, ¶mValueSize);
char* vname = (char*)alloca(sizeof(char) * paramValueSize);
errNum = clGetPlatformInfo(id, CL_PLATFORM_VENDOR, paramValueSize, info, NULL);
std::cout << "Platform name:" << name << std::endl
<< "Vendor name:" << vname << std::endl;
在ATI Stream SDK上,这个代码会显示:
Platform name: ATI Stream
Vendor name: Advanced Micro Devices, Inc.
OpenCL设备
各个平台可能会分别关联一组计算设备,应用程序将利用这些计算设备执行代码。给定一个平台,可以用以下命令查询支持的设备列表:
extern CL_API_ENTRY cl_int CL_API_CALL
clGetPlatformIDs(cl_uint num_entries,
cl_platform_id * platforms,
cl_uint * num_platforms) CL_API_SUFFIX__VERSION_1_0;
这个命令会得到与platform
关联的可用OpenCL设备列表。如果参数devices
为 NULL
,clGetDeviceIDs
会返回设备数。返回的设备数可以用num_entries
来限制 (0 <num_en-tries≤设备数
)。
计算设备的类型由参数device_type
指定,可以是表3-2中给定的某个值。各个设备将共享1.1中描述的执行和内存模型(如图1-6、图1-7和图1-8所示)。
CL_DEVICE_TYPE_CPU 作为宿主机处理器的OpenCL设备
CL_DEVICE_TYPE_GPU 作为GPU的OpenCL设备
CL_DEVICE_TYPE_ACCELERATOR OpenCL加速器(例如,IBM Cell Broadband)
CL_DEVICE_TYPE_DEFAULT 默认设备
CL_DEVICE_TYPE_ALL 与相应平台关联的所有OpenCL设备
CPU设备是一个同构设备,映射到可用内核集或者可用内核集的一个子集。通常可以利用大缓存进行优化来减少延迟;AMD的皓龙 (Opteron) 系列和 Intel 的酷睿(Core) i7系列就是这种例子。
GPU设备对应于面向图形和通用计算的吞吐量优化设备。这方面知名的例子包括ATI的Radeon系列和 NVIDIA的GTX 系列。
加速器设备涵盖从IBM的 Cell Broadband体系结构到不太著名的 DSP型等大量设备。
默认设备和所有设备选项分别允许OpenCL运行时库指定一个“首选”设备和所有可用设备。
对于CPU、CPU和加速器设备,对特定平台提供的设备数没有任何限制,由应用程序负责查询来确定具体的数目。下面的例子将展示给定一个平台时如何使用clGetDeviceIDs
查询和选择一个GPU设备,这里首先检查是否至少有这样一个设备:
cl_int errNum;
cl_uint numDevices;
cl_device_id deviceIds[1];
errNum = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 0, NULL, &numDevices);
if (numDevices < 1)
{
std::cout << "No GPU device found for platform " << platform << std::endl;
exit(1);
}
errNum = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &deviceIds[0], NULL);
给定一个设备,可以使用以下命令查询各种属性:
extern CL_API_ENTRY cl_int CL_API_CALL
clGetDeviceInfo(cl_device_id device,
cl_device_info param_name,
size_t param_value_size,
void * param_value,
size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0;
这个命令会返回关于OpenCL平台的特定信息。param_name
的可取值如下。可以将param_value_size
和param_value
的值分别设置为0
和NULL
来查询返回值的大小。
CL_DEVICE_TYPE cl_device_type OpenCL设备类型;合法类型见表3-2
CL_DEVICE_VENDOR_ID cl_uint 唯一的设备开发商标识符
CL_DEVICE_MAX_COMPUTE_UNITS cl_uint OpenCL设备上并行计算核的数目
CL_DEVICE_MAX_WORK_ITEMDIMENSIONS cl_uint 指定数据并行执行模型所用的全局和局部工作项ID的最大维度
CL_DEVICE_MAX_WORK_ITEM_SIZES size_t[] 为clEnqueueNDRangeKernel指定的工作组中各个维度的工作项最大数目
返回n个size_t项,其中n是CL_DEVICE_MAX_WORK_工TEM_DIMEN-s工ONS查询的返回值
最小值是(1,1,1)
CL_DEVICE_MAX_WORK_GROUP_SIZE size_t 执行内核(使用数据并行执行模型)的工作组中的工作项最大数目
CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR cl_uint 可置于矢量中的内置标量类型的期
CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT 望原生矢量宽度大小,定义为可在矢
CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT 量中存储的标量元素个数
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE
cL_DEVICE_PREFERRED_VECTOR_WIDTH_HALE
CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR cl_uint 返回原生指令集体系结构(Instruc-
cL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT tion Set Architecture,ISA)矢量宽度,
CL_DEVICE_NATIVE_VECTOR_WIDTH_INT 这里矢量宽度定义为可在矢量中存储
CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG 的标量元素个数
CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT
CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE
CL_DEVICE_NATIVE_VECTOR_WIDTH_HALE
CL_DEVICE_MAX__CLOCK_FREQUENCY cl_uint 设备配置的最大时钟频率(单位为MHz)
CL_DEVICE_ADDRESS_BITS cl_uint 设备地址空间默认大小,指定为无符号整数值(单位为位)
CL_DEVICE_MAX_MEM_ALLOC_SIZE cl_ulong 内存对象分配的最大字节数
CL_DEVICE_IMAGE_SUPPORT cl_bool 如果OpenCL设备支持图像则为CL_TRUE,否则为CL_FALSE
CL_DEVICE_MAX_READ_IMAGE_ARGS cl_uint 内核可以同时读取的图像对象的最大数目。如果CL_DEVICE_IMAGE_SUP-PORT 为CL_TRUE,则最小值为128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS cl_uint 可以同时写至内核的图像对象的最大数目。如果CL_DEVICE_IMAGE_SUPPORT为CL_TRUE,则最小值为8
CL_DEVICE_IMAGE2D_MAX_WIDTH size_t 2D图像的最大宽度(单位为像素)
CL_DEVICE_IMAGE2D_MAX_HEIGHT size_t 2D图像的最大高度(单位为像素)
CL_DEVICE_IMAGE3D_MAX_WIDTH size_t 3D图像的最大宽度(单位为像素)
CL_DEVICEIMAGE3D_MAX_HEIGHT size t 3D图像的最大高度(单位为像素)
CL_DEVICE_IMAGE3D_MAX_DEPTH size_t 3D图像的最大深度(单位为像素)
CL_DEVICE_MAX_SAMPLERS cl_uint 内核中可用的采样工具的最大数目
CL_DEVICE_MAX_PARAMETER_SIZE size_t 可以传入内核的参数的最大字节数
CL_DEVICE_MEM_BASE_ADDR_ALIGN cl_uint 描述分配的内存对象基地址的对齐设置(单位为位)
CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE cl_uint 数据类型中可用的最小对齐设置(单位为字节)
CL_DEVICE_SINGLE_FP_CONFIG cl_device_fp_config 描述设备的单精度浮点数能力。
这是一个位域,描述以下一个或多个值;
CL_FP_DENORM:支持非规格化数(Denorm)
CL_FP_INF_NAN:支持INF和静默非数(quiet NaN)
CL_FP_ROUND_TO_NEAREST:支持就近舍入模式(Round-to-nearest-even)
CL_FP_ROUND_TO_ZERO:支持向零舍入模式(Round-to-zero)
CL_FP_ROUND_TO_INF:支持正、负无穷舍入模式(Round-to- +ve和-ve)
CL_FP_FMA:支持IEEE 754-2008积和熔加运算(fused multiply add)
CL_FP_SOFT_FLOAT:软件中实现基本浮点操作(如加、减、乘)要求支持的最小浮点能力为
CL_FP_ROUND_TO_NEAREST ICL_FP_INF_NAN
CL_DEVICE_GLOBAL_MEM_CACHE_TYPE cl_device_mem_cache 支持的全局内存缓存的类型。
_type 合法值包括CL_NONE、CL_READ_ONLY__CACHE和CL_READ_WRITE_CACHE
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE cl_uint 全局内存缓存行的字节数
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE cl_ulong 全局内存缓存的字节数
CL_DEVICE_GLOBAL_MEM_SIZE cl_ulong 全局设备内存的字节数
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE cl_ulong 常量缓冲区分配的最大字节数
CL_DEVICE_MAX_CONSTANT_ARGS cl_uint 内核中用__constant限定符声明的参数的最大数目
CL_DEVICE_LOCAL_MEM_TYPE cl_device_local_mem 所支持的局部内存的类型。可以设置
_type 为CL_LOCAL(表示专用局部内存存储,如SRAM),也可以设置为CL_GLOBAL
CL_DEVICE_LOCAL_MEM_SIZE cl_ulong 局部存储区字节数
CL_DEVICE_ERROR_CORRECTION_SUPPORT cl_bool 如果在设备中为内存、缓存、寄存器等实现了错误修正,则为CL_TRUE;
如果设备没有实现错误修正,则为CL_FALSE。
这可能是OpenCL某些客户的需求
CL_DEVICE_HOST_UNIFIED_MEMORY cl_bool 如果设备和宿主机有统一的内存子系统,则为CL_TRUE;
否则为CL_FALSE
CL_DEVICE_PROFILING_TIMER_RESOLUTION size_t 描述设备定时器的分辨率(单位为纳秒)
CL_DEVICE_ENDIAN_LITTLE cl_bool 如果OpenCL设备是一个“小端”设备,则为CL_TRUE;否则为CL_FALSE
CL_DEVICE_AVAILABLE cl_bool 如果设备可用,则为CL_TRUE;
否则为CL_FALSE
CL_DEVICE_COMPILER_AVAILABLE cl_bool 如果实现没有提供一个编译器来编译程序源代码,则为CL_FALSE;
如果提供了编译器,则为CL_TRUE
CL_DEVICE_EXECUTION_CAPABILITIES cl_device_exec_ca- 描述设备的执行能力。这是一个位
pabilities 域,可以是以下一个或多个值:
CL_EXEC_KERNEL:OpenCL设备可以执行OpenCL内核
CL_EXEC_NATIVE_KERNEL:OpenCL设备可以执行原生内核
要求的最小执行能力为CL_EXEC_KERNEL
CL_DEVICE_QUEUE_PROPERTIES cl_command_queue 描述设备支持的命令队列属性。这是
_properties 一个位域,描述以下一个或多个值:CL_QUEUE_OUR_OF_ORDER_EXEC_MODE_EN-ABLE、CL_QUEUE_PROFILING_ENABLE
要求的最小能力为CL_QUEUE_PRO-FILING_ENABLE
CL_DEVICE_PLATFORM cl_platform_id 与这个设备关联的平台
CL_DEVICE_NAME char[] 设备名字符串
CL_DEVICE_VENDOR char[] 开发商名字符串
CL_DRIVER_VERSION char[] OpenCL软件驱动程序版本字符串,
形式为major_number . minor_num-ber(主版本号.次版本号)
CL_DEVICE_PROFILE1 char[] OpenCL简档字符串。返回设备支持的简档名。返回的简档名可以是以下字符串之一:
FULL_PROFILE如果设备支持OpenCL规范(核心规范定义的功能,不需要支持任何扩展)
EMBEDDED_PROFILE如果设备支持OpenCL嵌入式简档
CL_DEVICE_VERSION char[] OpenCL版本字符串。返回设备支持的OpenCL版本。
这个版本字符串有以下格式:
OpenCL< space > < major_version . minor_version > < space ><vendor - specific information > .
(OpenCL<空格><主版本号.次版本号><空格><开发商特定的信息>)
CL_DEVICE_EXTENSIONS char[] 返回设备支持的一个扩展名列表(用空格分隔,扩展名本身不包含任何空格)。
返回的扩展名列表可以是开发商支持的扩展名和一个或多个以下已批准的扩展名:
cl_khr_fp64、cl_khr int64_base_atomics、
cl_khr_int64_extended_atomics、cl_khr_fp16、cl_khr_gl_sharing
下面这个简单的例子展示了如何使用clGetDeviceInfo()
查询一个设备,得到计算单元的最大数目:
cl_int err;
size_t size;
err = clGetDeviceInfo(deviceID, CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(cl_uint), &maxComputeUnits, &size);
std::cout << "Device has max compute units: " << maxComputeUnits << std::endl;
在ATI Stream SDK上,对于一个Intel i7 CPU设备这个代码会显示以下结果:
Device 4098 has max compute units: 8
模板类InfoDevice完成具体的工作,它提供公共方法display()
来获取和显示所请求的信息。之前查询设备最大计算单元的例子可以重写为:
InfoDevice<cl_uint>::display(deviceID, CL_DEVICE_MAX_COMPUTE_UNITS, "DEVICE has max compute units");
OpenCLInfo.cpp
//
// Book: OpenCL(R) Programming Guide
// Authors: Aaftab Munshi, Benedict Gaster, Dan Ginsburg, Timothy Mattson
// ISBN-10: ??????????
// ISBN-13: ?????????????
// Publisher: Addison-Wesley Professional
// URLs: http://safari.informit.com/??????????
// http://www.????????.com
//
// OpenCLInfo.cpp
//
// This is a simple example that demonstrates use of the clGetInfo* functions,
// with particular focus on platforms and their associated devices.
#include <iostream>
#include <fstream>
#include <sstream>
#if defined(_WIN32)
#include <malloc.h> // needed for alloca
#endif // _WIN32
#if defined(linux) || defined(__APPLE__) || defined(__MACOSX)
# include <alloca.h>
#endif // linux
#ifdef __APPLE__
#include <OpenCL/cl.h>
#else
#include <CL/cl.h>
#endif
///
// Display information for a particular platform.
// Assumes that all calls to clGetPlatformInfo returns
// a value of type char[], which is valid for OpenCL 1.1.
//
void DisplayPlatformInfo(
cl_platform_id id,
cl_platform_info name,
std::string str)
{
cl_int errNum;
std::size_t paramValueSize;
errNum = clGetPlatformInfo(
id,
name,
0,
NULL,
¶mValueSize);
if (errNum != CL_SUCCESS)
{
std::cerr << "Failed to find OpenCL platform " << str << "." << std::endl;
return;
}
char* info = (char*)alloca(sizeof(char) * paramValueSize);
errNum = clGetPlatformInfo(
id,
name,
paramValueSize,
info,
NULL);
if (errNum != CL_SUCCESS)
{
std::cerr << "Failed to find OpenCL platform " << str << "." << std::endl;
return;
}
std::cout << "\t" << str << ":\t" << info << std::endl;
}
template<typename T>
void appendBitfield(T info, T value, std::string name, std::string& str)
{
if (info & value)
{
if (str.length() > 0)
{
str.append(" | ");
}
str.append(name);
}
}
///
// Display information for a particular device.
// As different calls to clGetDeviceInfo may return
// values of different types a template is used.
// As some values returned are arrays of values, a templated class is
// used so it can be specialized for this case, see below.
//
template <typename T>
class InfoDevice
{
public:
static void display(
cl_device_id id,
cl_device_info name,
std::string str)
{
cl_int errNum;
std::size_t paramValueSize;
errNum = clGetDeviceInfo(
id,
name,
0,
NULL,
¶mValueSize);
if (errNum != CL_SUCCESS)
{
std::cerr << "Failed to find OpenCL device info " << str << "." << std::endl;
return;
}
T* info = (T*)alloca(sizeof(T) * paramValueSize);
errNum = clGetDeviceInfo(
id,
name,
paramValueSize,
info,
NULL);
if (errNum != CL_SUCCESS)
{
std::cerr << "Failed to find OpenCL device info " << str << "." << std::endl;
return;
}
// Handle a few special cases
switch (name)
{
case CL_DEVICE_TYPE:
{
std::string deviceType;
appendBitfield<cl_device_type>(
*(reinterpret_cast<cl_device_type*>(info)),
CL_DEVICE_TYPE_CPU,
"CL_DEVICE_TYPE_CPU",
deviceType);
appendBitfield<cl_device_type>(
*(reinterpret_cast<cl_device_type*>(info)),
CL_DEVICE_TYPE_GPU,
"CL_DEVICE_TYPE_GPU",
deviceType);
appendBitfield<cl_device_type>(
*(reinterpret_cast<cl_device_type*>(info)),
CL_DEVICE_TYPE_ACCELERATOR,
"CL_DEVICE_TYPE_ACCELERATOR",
deviceType);
appendBitfield<cl_device_type>(
*(reinterpret_cast<cl_device_type*>(info)),
CL_DEVICE_TYPE_DEFAULT,
"CL_DEVICE_TYPE_DEFAULT",
deviceType);
std::cout << "\t\t" << str << ":\t" << deviceType << std::endl;
}
break;
case CL_DEVICE_SINGLE_FP_CONFIG:
{
std::string fpType;
appendBitfield<cl_device_fp_config>(
*(reinterpret_cast<cl_device_fp_config*>(info)),
CL_FP_DENORM,
"CL_FP_DENORM",
fpType);
appendBitfield<cl_device_fp_config>(
*(reinterpret_cast<cl_device_fp_config*>(info)),
CL_FP_INF_NAN,
"CL_FP_INF_NAN",
fpType);
appendBitfield<cl_device_fp_config>(
*(reinterpret_cast<cl_device_fp_config*>(info)),
CL_FP_ROUND_TO_NEAREST,
"CL_FP_ROUND_TO_NEAREST",
fpType);
appendBitfield<cl_device_fp_config>(
*(reinterpret_cast<cl_device_fp_config*>(info)),
CL_FP_ROUND_TO_ZERO,
"CL_FP_ROUND_TO_ZERO",
fpType);
appendBitfield<cl_device_fp_config>(
*(reinterpret_cast<cl_device_fp_config*>(info)),
CL_FP_ROUND_TO_INF,
"CL_FP_ROUND_TO_INF",
fpType);
appendBitfield<cl_device_fp_config>(
*(reinterpret_cast<cl_device_fp_config*>(info)),
CL_FP_FMA,
"CL_FP_FMA",
fpType);
#ifdef CL_FP_SOFT_FLOAT
appendBitfield<cl_device_fp_config>(
*(reinterpret_cast<cl_device_fp_config*>(info)),
CL_FP_SOFT_FLOAT,
"CL_FP_SOFT_FLOAT",
fpType);
#endif
std::cout << "\t\t" << str << ":\t" << fpType << std::endl;
}
case CL_DEVICE_GLOBAL_MEM_CACHE_TYPE:
{
std::string memType;
appendBitfield<cl_device_mem_cache_type>(
*(reinterpret_cast<cl_device_mem_cache_type*>(info)),
CL_NONE,
"CL_NONE",
memType);
appendBitfield<cl_device_mem_cache_type>(
*(reinterpret_cast<cl_device_mem_cache_type*>(info)),
CL_READ_ONLY_CACHE,
"CL_READ_ONLY_CACHE",
memType);
appendBitfield<cl_device_mem_cache_type>(
*(reinterpret_cast<cl_device_mem_cache_type*>(info)),
CL_READ_WRITE_CACHE,
"CL_READ_WRITE_CACHE",
memType);
std::cout << "\t\t" << str << ":\t" << memType << std::endl;
}
break;
case CL_DEVICE_LOCAL_MEM_TYPE:
{
std::string memType;
appendBitfield<cl_device_local_mem_type>(
*(reinterpret_cast<cl_device_local_mem_type*>(info)),
CL_GLOBAL,
"CL_LOCAL",
memType);
appendBitfield<cl_device_local_mem_type>(
*(reinterpret_cast<cl_device_local_mem_type*>(info)),
CL_GLOBAL,
"CL_GLOBAL",
memType);
std::cout << "\t\t" << str << ":\t" << memType << std::endl;
}
break;
case CL_DEVICE_EXECUTION_CAPABILITIES:
{
std::string memType;
appendBitfield<cl_device_exec_capabilities>(
*(reinterpret_cast<cl_device_exec_capabilities*>(info)),
CL_EXEC_KERNEL,
"CL_EXEC_KERNEL",
memType);
appendBitfield<cl_device_exec_capabilities>(
*(reinterpret_cast<cl_device_exec_capabilities*>(info)),
CL_EXEC_NATIVE_KERNEL,
"CL_EXEC_NATIVE_KERNEL",
memType);
std::cout << "\t\t" << str << ":\t" << memType << std::endl;
}
break;
case CL_DEVICE_QUEUE_PROPERTIES:
{
std::string memType;
appendBitfield<cl_device_exec_capabilities>(
*(reinterpret_cast<cl_device_exec_capabilities*>(info)),
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE,
"CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE",
memType);
appendBitfield<cl_device_exec_capabilities>(
*(reinterpret_cast<cl_device_exec_capabilities*>(info)),
CL_QUEUE_PROFILING_ENABLE,
"CL_QUEUE_PROFILING_ENABLE",
memType);
std::cout << "\t\t" << str << ":\t" << memType << std::endl;
}
break;
default:
std::cout << "\t\t" << str << ":\t" << *info << std::endl;
break;
}
}
};
///
// Simple trait class used to wrap base types.
//
template <typename T>
class ArrayType
{
public:
static bool isChar() { return false; }
};
///
// Specialized for the char (i.e. null terminated string case).
//
template<>
class ArrayType<char>
{
public:
static bool isChar() { return true; }
};
///
// Specialized instance of class InfoDevice for array types.
//
template <typename T>
class InfoDevice<ArrayType<T> >
{
public:
static void display(
cl_device_id id,
cl_device_info name,
std::string str)
{
cl_int errNum;
std::size_t paramValueSize;
errNum = clGetDeviceInfo(
id,
name,
0,
NULL,
¶mValueSize);
if (errNum != CL_SUCCESS)
{
std::cerr
<< "Failed to find OpenCL device info "
<< str
<< "."
<< std::endl;
return;
}
T* info = (T*)alloca(sizeof(T) * paramValueSize);
errNum = clGetDeviceInfo(
id,
name,
paramValueSize,
info,
NULL);
if (errNum != CL_SUCCESS)
{
std::cerr
<< "Failed to find OpenCL device info "
<< str
<< "."
<< std::endl;
return;
}
if (ArrayType<T>::isChar())
{
std::cout << "\t" << str << ":\t" << info << std::endl;
}
else if (name == CL_DEVICE_MAX_WORK_ITEM_SIZES)
{
cl_uint maxWorkItemDimensions;
errNum = clGetDeviceInfo(
id,
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS,
sizeof(cl_uint),
&maxWorkItemDimensions,
NULL);
if (errNum != CL_SUCCESS)
{
std::cerr
<< "Failed to find OpenCL device info "
<< "CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS."
<< std::endl;
return;
}
std::cout << "\t" << str << ":\t";
for (cl_uint i = 0; i < maxWorkItemDimensions; i++)
{
std::cout << info[i] << " ";
}
std::cout << std::endl;
}
}
};
///
// Enumerate platforms and display information about them
// and their associated devices.
//
void displayInfo(void)
{
cl_int errNum;
cl_uint numPlatforms;
cl_platform_id* platformIds;
cl_context context = NULL;
// First, query the total number of platforms
errNum = clGetPlatformIDs(0, NULL, &numPlatforms);
if (errNum != CL_SUCCESS || numPlatforms <= 0)
{
std::cerr << "Failed to find any OpenCL platform." << std::endl;
return;
}
// Next, allocate memory for the installed plaforms, and qeury
// to get the list.
platformIds = (cl_platform_id*)alloca(sizeof(cl_platform_id) * numPlatforms);
// First, query the total number of platforms
errNum = clGetPlatformIDs(numPlatforms, platformIds, NULL);
if (errNum != CL_SUCCESS)
{
std::cerr << "Failed to find any OpenCL platforms." << std::endl;
return;
}
std::cout << "Number of platforms: \t" << numPlatforms << std::endl;
// Iterate through the list of platforms displaying associated information
for (cl_uint i = 0; i < numPlatforms; i++) {
// First we display information associated with the platform
DisplayPlatformInfo(
platformIds[i],
CL_PLATFORM_PROFILE,
"CL_PLATFORM_PROFILE");
DisplayPlatformInfo(
platformIds[i],
CL_PLATFORM_VERSION,
"CL_PLATFORM_VERSION");
DisplayPlatformInfo(
platformIds[i],
CL_PLATFORM_VENDOR,
"CL_PLATFORM_VENDOR");
DisplayPlatformInfo(
platformIds[i],
CL_PLATFORM_EXTENSIONS,
"CL_PLATFORM_EXTENSIONS");
// Now query the set of devices associated with the platform
cl_uint numDevices;
errNum = clGetDeviceIDs(
platformIds[i],
CL_DEVICE_TYPE_ALL,
0,
NULL,
&numDevices);
if (errNum != CL_SUCCESS)
{
std::cerr << "Failed to find OpenCL devices." << std::endl;
return;
}
cl_device_id* devices = (cl_device_id*)alloca(sizeof(cl_device_id) * numDevices);
errNum = clGetDeviceIDs(
platformIds[i],
CL_DEVICE_TYPE_ALL,
numDevices,
devices,
NULL);
if (errNum != CL_SUCCESS)
{
std::cerr << "Failed to find OpenCL devices." << std::endl;
return;
}
std::cout << "\tNumber of devices: \t" << numDevices << std::endl;
// Iterate through each device, displaying associated information
for (cl_uint j = 0; j < numDevices; j++)
{
InfoDevice<cl_device_type>::display(
devices[j],
CL_DEVICE_TYPE,
"CL_DEVICE_TYPE");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_VENDOR_ID,
"CL_DEVICE_VENDOR_ID");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_MAX_COMPUTE_UNITS,
"CL_DEVICE_MAX_COMPUTE_UNITS");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS,
"CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS");
InfoDevice<ArrayType<size_t> >::display(
devices[j],
CL_DEVICE_MAX_WORK_ITEM_SIZES,
"CL_DEVICE_MAX_WORK_ITEM_SIZES");
InfoDevice<std::size_t>::display(
devices[j],
CL_DEVICE_MAX_WORK_GROUP_SIZE,
"CL_DEVICE_MAX_WORK_GROUP_SIZE");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR,
"CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT,
"CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT,
"CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG,
"CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT,
"CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE,
"CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE");
#ifdef CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF,
"CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR,
"CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT,
"CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_NATIVE_VECTOR_WIDTH_INT,
"CL_DEVICE_NATIVE_VECTOR_WIDTH_INT");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG,
"CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT,
"CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE,
"CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF,
"CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF");
#endif
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_MAX_CLOCK_FREQUENCY,
"CL_DEVICE_MAX_CLOCK_FREQUENCY");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_ADDRESS_BITS,
"CL_DEVICE_ADDRESS_BITS");
InfoDevice<cl_ulong>::display(
devices[j],
CL_DEVICE_MAX_MEM_ALLOC_SIZE,
"CL_DEVICE_MAX_MEM_ALLOC_SIZE");
InfoDevice<cl_bool>::display(
devices[j],
CL_DEVICE_IMAGE_SUPPORT,
"CL_DEVICE_IMAGE_SUPPORT");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_MAX_READ_IMAGE_ARGS,
"CL_DEVICE_MAX_READ_IMAGE_ARGS");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_MAX_WRITE_IMAGE_ARGS,
"CL_DEVICE_MAX_WRITE_IMAGE_ARGS");
InfoDevice<std::size_t>::display(
devices[j],
CL_DEVICE_IMAGE2D_MAX_WIDTH,
"CL_DEVICE_IMAGE2D_MAX_WIDTH");
InfoDevice<std::size_t>::display(
devices[j],
CL_DEVICE_IMAGE2D_MAX_WIDTH,
"CL_DEVICE_IMAGE2D_MAX_WIDTH");
InfoDevice<std::size_t>::display(
devices[j],
CL_DEVICE_IMAGE2D_MAX_HEIGHT,
"CL_DEVICE_IMAGE2D_MAX_HEIGHT");
InfoDevice<std::size_t>::display(
devices[j],
CL_DEVICE_IMAGE3D_MAX_WIDTH,
"CL_DEVICE_IMAGE3D_MAX_WIDTH");
InfoDevice<std::size_t>::display(
devices[j],
CL_DEVICE_IMAGE3D_MAX_HEIGHT,
"CL_DEVICE_IMAGE3D_MAX_HEIGHT");
InfoDevice<std::size_t>::display(
devices[j],
CL_DEVICE_IMAGE3D_MAX_DEPTH,
"CL_DEVICE_IMAGE3D_MAX_DEPTH");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_MAX_SAMPLERS,
"CL_DEVICE_MAX_SAMPLERS");
InfoDevice<std::size_t>::display(
devices[j],
CL_DEVICE_MAX_PARAMETER_SIZE,
"CL_DEVICE_MAX_PARAMETER_SIZE");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_MEM_BASE_ADDR_ALIGN,
"CL_DEVICE_MEM_BASE_ADDR_ALIGN");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE,
"CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE");
InfoDevice<cl_device_fp_config>::display(
devices[j],
CL_DEVICE_SINGLE_FP_CONFIG,
"CL_DEVICE_SINGLE_FP_CONFIG");
InfoDevice<cl_device_mem_cache_type>::display(
devices[j],
CL_DEVICE_GLOBAL_MEM_CACHE_TYPE,
"CL_DEVICE_GLOBAL_MEM_CACHE_TYPE");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE,
"CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE");
InfoDevice<cl_ulong>::display(
devices[j],
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE,
"CL_DEVICE_GLOBAL_MEM_CACHE_SIZE");
InfoDevice<cl_ulong>::display(
devices[j],
CL_DEVICE_GLOBAL_MEM_SIZE,
"CL_DEVICE_GLOBAL_MEM_SIZE");
InfoDevice<cl_ulong>::display(
devices[j],
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE,
"CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE");
InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_MAX_CONSTANT_ARGS,
"CL_DEVICE_MAX_CONSTANT_ARGS");
InfoDevice<cl_device_local_mem_type>::display(
devices[j],
CL_DEVICE_LOCAL_MEM_TYPE,
"CL_DEVICE_LOCAL_MEM_TYPE");
InfoDevice<cl_ulong>::display(
devices[j],
CL_DEVICE_LOCAL_MEM_SIZE,
"CL_DEVICE_LOCAL_MEM_SIZE");
InfoDevice<cl_bool>::display(
devices[j],
CL_DEVICE_ERROR_CORRECTION_SUPPORT,
"CL_DEVICE_ERROR_CORRECTION_SUPPORT");
#ifdef CL_DEVICE_HOST_UNIFIED_MEMORY
InfoDevice<cl_bool>::display(
devices[j],
CL_DEVICE_HOST_UNIFIED_MEMORY,
"CL_DEVICE_HOST_UNIFIED_MEMORY");
#endif
InfoDevice<std::size_t>::display(
devices[j],
CL_DEVICE_PROFILING_TIMER_RESOLUTION,
"CL_DEVICE_PROFILING_TIMER_RESOLUTION");
InfoDevice<cl_bool>::display(
devices[j],
CL_DEVICE_ENDIAN_LITTLE,
"CL_DEVICE_ENDIAN_LITTLE");
InfoDevice<cl_bool>::display(
devices[j],
CL_DEVICE_AVAILABLE,
"CL_DEVICE_AVAILABLE");
InfoDevice<cl_bool>::display(
devices[j],
CL_DEVICE_COMPILER_AVAILABLE,
"CL_DEVICE_COMPILER_AVAILABLE");
InfoDevice<cl_device_exec_capabilities>::display(
devices[j],
CL_DEVICE_EXECUTION_CAPABILITIES,
"CL_DEVICE_EXECUTION_CAPABILITIES");
InfoDevice<cl_command_queue_properties>::display(
devices[j],
CL_DEVICE_QUEUE_PROPERTIES,
"CL_DEVICE_QUEUE_PROPERTIES");
InfoDevice<cl_platform_id>::display(
devices[j],
CL_DEVICE_PLATFORM,
"CL_DEVICE_PLATFORM");
InfoDevice<ArrayType<char> >::display(
devices[j],
CL_DEVICE_NAME,
"CL_DEVICE_NAME");
InfoDevice<ArrayType<char> >::display(
devices[j],
CL_DEVICE_VENDOR,
"CL_DEVICE_VENDOR");
InfoDevice<ArrayType<char> >::display(
devices[j],
CL_DRIVER_VERSION,
"CL_DRIVER_VERSION");
InfoDevice<ArrayType<char> >::display(
devices[j],
CL_DEVICE_PROFILE,
"CL_DEVICE_PROFILE");
InfoDevice<ArrayType<char> >::display(
devices[j],
CL_DEVICE_VERSION,
"CL_DEVICE_VERSION");
#ifdef CL_DEVICE_OPENCL_C_VERSION
InfoDevice<ArrayType<char> >::display(
devices[j],
CL_DEVICE_OPENCL_C_VERSION,
"CL_DEVICE_OPENCL_C_VERSION");
#endif
InfoDevice<ArrayType<char> >::display(
devices[j],
CL_DEVICE_EXTENSIONS,
"CL_DEVICE_EXTENSIONS");
std::cout << std::endl << std::endl;
}
}
}
///
// main() for OpenCLInfo example
//
int main(int argc, char** argv)
{
cl_context context = 0;
displayInfo();
return 0;
}