16. WebGPU 数据内存布局

在 WebGPU 中，提供给它的几乎所有数据都需要在内存中设定布局，以匹配在着色器中定义的内容。这与 JavaScript 和 TypeScript 形成鲜明对比，后者很少出现内存布局问题。

在 WGSL 中，当编写着色器时，通常定义 struct 。结构有点像 JavaScript 对象，声明结构的成员，类似于 JavaScript 对象的属性。但是，除了给每个属性一个名称之外，还必须给它一个类型。并且，在提供数据时，需要计算该结构的特定成员将出现在缓冲区中的哪个位置。

在 WGSL v1 中，有 4 种基本类型

f32 (a 32bit floating point number)
i32 (a 32bit integer)
u32 (a 32bit unsigned integer)
f16 (a 16bit floating point number) [注释1]

一个字节是 8 bit，所以一个 32 bit 的值需要 4 个字节，一个 16 bit 的值需要 2 个字节。

如下声明一个结构

struct OurStruct {
  velocity: f32,
  acceleration: f32,
  frameCount: u32,
};

该结构的可视化表示可能看起来像这样

在这里插入图片描述

上图中每个方块是一个字节。上面数据占用了 12 个字节。 velocity 占用前 4 个字节。 acceleration 取下 4 个， frameCount 取最后 4 个。

To pass data to the shader we need to prepare data to match the memory layout OurStruct. To do that we need to make an ArrayBuffer of 12 bytes, then setup TypedArray views of the correct type so we can fill it out.

要将数据传递给着色器，需要准备数据以匹配OurStruct内存布局。为此，需要定义一个 12 字节的 ArrayBuffer ，然后设置正确类型的 TypedArray 视图，以便可以方便填充它。

const kOurStructSizeBytes =
  4 + // velocity
  4 + // acceleration
  4 ; // frameCount
const ourStructData = new ArrayBuffer(kOurStructSizeBytes);
const ourStructValuesAsF32 = new Float32Array(ourStructData);
const ourStructValuesAsU32 = new Uint32Array(ourStructData);

上面， ourStructData 是一个 ArrayBuffer ，它是一块内存。为了方便查看此内存的内容，创建了它的视图。 ourStructValuesAsF32 是将内存视为 32 位浮点值的视图。 ourStructValuesAsU32 是与 32 位无符号整数值相同的内存视图。

现在有了一个缓冲区和 2 个视图，下边可以在结构中设置数据。

const kVelocityOffset = 0;
const kAccelerationOffset = 1;
const kFrameCountOffset = 2;
 
ourStructValuesAsF32[kVelocityOffset] = 1.2;
ourStructValuesAsF32[kAccelerationOffset] = 3.4;
ourStructValuesAsU32[kFrameCountOffset] = 56;    // an integer value

请注意，就像编程中的许多事情一样，我们可以通过多种方式来做到这一点。 TypeArray 有一个采用各种形式的构造函数。例如

new Float32Array(12)

这个版本创建了一个新的 ArrayBuffer ，在本例中为 12 * 4 字节。然后它创建 Float32Array 来查看它。
new Float32Array([4, 5, 6])

这个版本创建了一个新的 ArrayBuffer ，在本例中为 3 * 4 字节。然后它创建
Float32Array 来查看它。并将初始值设置为 4、5、6。

注意，您还可以传递另一个 TypedArray 。例如

new Float32Array(someUint8ArrayOf6Values) 将新建一个大小为 6 * 4 的
ArrayBuffer，然后创建一个 Float32Array 来查看它，然后将现有视图中的值复制到新的 Float32Array
中。这些值是按数字而不是二进制复制的。换句话说，它们是这样复制的
```
srcArray.forEach((v, i) => dstArray[i] = v); 
```
new Float32Array(someArrayBuffer)

这是我们之前使用的案例。在现有缓冲区上创建一个新的 Float32Array 视图。
new Float32Array(someArrayBuffer, byteOffset)

这会在现有缓冲区上创建一个新的 Float32Array ，但会在 byteOffset 处开始查看视图
new Float32Array(someArrayBuffer, byteOffset, length)

这会在现有缓冲区上创建一个新的 Float32Array 。视图从 byteOffset 开始，长度为 length
个单位。因此，如果我们传递 3 或长度，视图将是 someArrayBuffer 的 3 个 float32 值长（12 字节）

使用最后一种形式，可以将上面的代码更改为

const kOurStructSizeBytes =
  4 + // velocity
  4 + // acceleration
  4 ; // frameCount
const ourStructData = new ArrayBuffer(kOurStructSizeBytes);
const velocityView = new Float32Array(ourStructData, 0, 1); //从1开始
const accelerationView = new Float32Array(ourStructData, 4, 1);//从4开始
const frameCountView = new Uint32Array(ourStructData, 8, 1);//从8开始
 
velocityView[0] = 1.2;
accelerationView[0] = 3.4;
frameCountView[0] = 56;

此外，每个 TypedArray 都具有以下属性

length: number of units
byteLength: size in bytes
byteOffset: offset in the TypeArray’s ArrayBuffer
buffer: the ArrayBuffer this TypeArray is viewing

TypeArray 有各种方法，很多都和 Array 类似，但 subarray 方法不同于Array 。subarray 创建了一个相同类型的新 TypedArray 视图。它的参数是 subarray(begin, end) ，不包括 end,即区间 [begin, end) 。所以 someTypedArray.subarray(5, 10) 使 someTypedArray 的元素 5 到 9 的相同 ArrayBuffer 成为新的 TypedArray 。

所以可以把上面的代码改成这样

const kOurStructSizeFloat32Units =
  1 + // velocity
  1 + // acceleration
  1 ; // frameCount
const ourStructData = new Float32Array(kOurStructSizeFloat32Units)
const velocityView = ourStructData.subarray(0, 1)
const accelerationView = new Float32Array(ourStructData, 1, 2);
const frameCountView = new Uint32Array(ourStructData, 2, 3);
 
velocityView[0] = 1.2;
accelerationView[0] = 3.4;
frameCountView[0] = 56;

WGSL 具有由 4 种基本类型构成的类型。它们是：

type	description	short name
`vec2<f32>`	a type with 2 `f32`s	`vec2f`
`vec2<u32>`	a type with 2 `u32`s	`vec2u`
`vec2<i32>`	a type with 2 `i32`s	`vec2i`
`vec2<f16>`	a type with 2 `f16`s	`vec2h`
`vec3<f32>`	a type with 3 `f32`s	`vec3f`
`vec3<u32>`	a type with 3 `u32`s	`vec3u`
`vec3<i32>`	a type with 3 `i32`s	`vec3i`
`vec3<f16>`	a type with 3 `f16`s	`vec3h`
`vec4<f32>`	a type with 4 `f32`s	`vec4f`
`vec4<u32>`	a type with 4 `u32`s	`vec4u`
`vec4<i32>`	a type with 4 `i32`s	`vec4i`
`vec4<f16>`	a type with 4 `f16`s	`vec4h`
`mat2x2<f32>`	a matrix of 2 `vec2<f32>`s	`mat2x2f`
`mat2x2<u32>`	a matrix of 2 `vec2<u32>`s	`mat2x2u`
`mat2x2<i32>`	a matrix of 2 `vec2<i32>`s	`mat2x2i`
`mat2x2<f16>`	a matrix of 2 `vec2<f16>`s	`mat2x2h`
`mat2x3<f32>`	a matrix of 2 `vec3<f32>`s	`mat2x3f`
`mat2x3<u32>`	a matrix of 2 `vec3<u32>`s	`mat2x3u`
`mat2x3<i32>`	a matrix of 2 `vec3<i32>`s	`mat2x3i`
`mat2x3<f16>`	a matrix of 2 `vec3<f16>`s	`mat2x3h`
`mat2x4<f32>`	a matrix of 2 `vec4<f32>`s	`mat2x4f`
`mat2x4<u32>`	a matrix of 2 `vec4<u32>`s	`mat2x4u`
`mat2x4<i32>`	a matrix of 2 `vec4<i32>`s	`mat2x4i`
`mat2x4<f16>`	a matrix of 2 `vec4<f16>`s	`mat2x4h`
`mat3x2<f32>`	a matrix of 3 `vec2<f32>`s	`mat3x2f`
`mat3x2<u32>`	a matrix of 3 `vec2<u32>`s	`mat3x2u`
`mat3x2<i32>`	a matrix of 3 `vec2<i32>`s	`mat3x2i`
`mat3x2<f16>`	a matrix of 3 `vec2<f16>`s	`mat3x2h`
`mat3x3<f32>`	a matrix of 3 `vec3<f32>`s	`mat3x3f`
`mat3x3<u32>`	a matrix of 3 `vec3<u32>`s	`mat3x3u`
`mat3x3<i32>`	a matrix of 3 `vec3<i32>`s	`mat3x3i`
`mat3x3<f16>`	a matrix of 3 `vec3<f16>`s	`mat3x3h`
`mat3x4<f32>`	a matrix of 3 `vec4<f32>`s	`mat3x4f`
`mat3x4<u32>`	a matrix of 3 `vec4<u32>`s	`mat3x4u`
`mat3x4<i32>`	a matrix of 3 `vec4<i32>`s	`mat3x4i`
`mat3x4<f16>`	a matrix of 3 `vec4<f16>`s	`mat3x4h`
`mat4x2<f32>`	a matrix of 4 `vec2<f32>`s	`mat4x2f`
`mat4x2<u32>`	a matrix of 4 `vec2<u32>`s	`mat4x2u`
`mat4x2<i32>`	a matrix of 4 `vec2<i32>`s	`mat4x2i`
`mat4x2<f16>`	a matrix of 4 `vec2<f16>`s	`mat4x2h`
`mat4x3<f32>`	a matrix of 4 `vec3<f32>`s	`mat4x3f`
`mat4x3<u32>`	a matrix of 4 `vec3<u32>`s	`mat4x3u`
`mat4x3<i32>`	a matrix of 4 `vec3<i32>`s	`mat4x3i`
`mat4x3<f16>`	a matrix of 4 `vec3<f16>`s	`mat4x3h`
`mat4x4<f32>`	a matrix of 4 `vec4<f32>`s	`mat4x4f`
`mat4x4<u32>`	a matrix of 4 `vec4<u32>`s	`mat4x4u`
`mat4x4<i32>`	a matrix of 4 `vec4<i32>`s	`mat4x4i`
`mat4x4<f16>`	a matrix of 4 `vec4<f16>`s	`mat4x4h`

鉴于 vec3f 是一个有 3 个 f32 的类型，而 mat4x4f 是一个 4x4 的 f32 矩阵，所以它是 16 个 f32 ，你想想下面的结构在内存中看起来像什么？

struct Ex2 {
  scale: f32,
  offset: vec3f, //align 16
  projection: mat4x4f,
};

准备好了吗？答案揭晓！

在这里插入图片描述
这是怎么回事？事实证明每种类型都有对齐要求。对于给定的类型，它必须对齐到特定字节数的倍数。

以下是各种类型的尺寸和对齐方式。

type	size	align
i32	4	4
u32	4	4
f32	4	4
f16	2	2
vec2	8	8
vec2	8	8
vec2	8	8
vec2	4	4
vec3	12	16
vec3	12	16
vec3	12	16
vec3	6	8
vec4	16	16
vec4	16	16
vec4	16	16
vec4	8	8
mat2x2	16	8
mat2x2	8	4
mat3x2	24	8
mat3x2	12	4
mat4x2	32	8
mat4x2	16	4
mat2x3	32	16
mat2x3	16	8
mat3x3	48	16
mat3x3	24	8
mat4x3	64	16
mat4x3	32	8
mat2x4	32	16
mat2x4	16	8
mat3x4	48	16
mat3x4	24	8
mat4x4	64	16
mat4x4	32	8

但是等等，还有！

你认为这个结构的布局是什么？

struct Ex3 {
  transform: mat3x3f,
  directions: array<vec3f, 4>,
};

array<type, count> 语法定义了一个包含 count 个元素的 type 数组。

答案揭晓…

在这里插入图片描述

如果查看对齐表，您会看到 vec3 有 16 个字节的对齐。这意味着每个 vec3 ，无论它是在矩阵还是数组中，最终都会有一个额外的对齐空间。

还有另外一个例子:

struct Ex4a {
  velocity: vec3f,
};
 
struct Ex4 {
  orientation: vec3f,
  size: f32, // align 4
  direction: array<vec3f, 1>, //align  16    
  scale: f32,
  info: Ex4a, //align  16
  friction: f32,
};

在这里插入图片描述

Why did size end up at byte offset 12, just after orientation but scale and friction got bumped offsets 32 and 64
为什么 size 在offset=12 处，而 scale 和 friction 的offset却为 32 和 64?

That’s because arrays and structs have their own own special alignment rules so even though the array is a single vec3f and the Ex4a struct is also a single vec3f they get aligned according to different rules.

那是因为数组和结构有它们自己的特殊对齐规则，所以即使数组是单个 vec3f 并且 Ex4a 结构也是单个 vec3f ，它们也会根据不同的规则对齐。

计算偏移量和大小是一个 PITA！（Computing Offset and Sizes is a PITA!）

在 WGSL 中计算数据的大小和偏移量可能是 WebGPU 最大的痛点。您需要自己计算这些偏移量并保持最新。如果您在着色器中某个结构的中间某处添加一个成员，您需要返回到您的 JavaScript 并更新所有偏移量。如果单个字节或长度错误，则传递给着色器的数据将是错误的。你不会得到错误，但你的着色器可能会做错事，因为它正在查看错误的数据。你的模型不会绘制或者你的计算会产生不好的结果。

幸运的是，有现成的库可以帮助解决这个问题。

webgpu-utils 就是其中一个

你给它 WGSL 代码，它给你一个 API 来为你做这一切。这样你就可以改变你的结构，而且通常情况下，一切都会正常进行。

例如，使用最后一个示例，我们可以像这样将它传递给 webgpu-utils

import {
  makeShaderDataDefinitions,
  makeStructuredView,
} from 'https://greggman.github.io/webgpu-utils/dist/0.x/webgpu-utils.module.js';
 
const code = `
struct Ex4a {
  velocity: vec3f,
};
 
struct Ex4 {
  orientation: vec3f,
  size: f32,
  direction: array<vec3f, 1>,
  scale: f32,
  info: Ex4a,
  friction: f32,
};
@group(0) @binding(0) var<uniform> myUniforms: Ex4;
 
...
`;
 
const defs = makeShaderDataDefinitions(code);
const myUniformValues = makeStructuredView(defs.uniforms.myUniforms);
 
// Set some values via set
myUniformValues.set({
  orientation: [1, 0, -1],
  size: 2,
  direction: [0, 1, 0],
  scale: 1.5,
  info: {
    velocity: [2, 3, 4],
  },
  friction: 0.1,
});
 
// now pass myUniformValues.arrayBuffer to WebGPU when needed.