TensorFlow 基础（三）梯度和自动微分

news2025/4/18 20:37:02

文章目录

Computing gradients
Gradient tapes
Gradients with respect to a model
Controlling what the tape watches
Intermediate results
Gradients of non-scalar targets
Cases where gradients returns None
References

import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf

Computing gradients

要实现自动微分，TensorFlow 需要记住在前向传递（forward pass）过程中哪些运算以何种顺序发生。随后，在后向传递（backward pass）期间，TensorFlow 以相反的顺序遍历此运算列表来计算梯度。

Gradient tapes

TensorFlow 为自动微分提供了 tf.GradientTape API，即计算某个量相对于某些输入（通常是 tf.Variable）的梯度。TensorFlow 会将在 tf.GradientTape 内执行的相关运算“记录”到“条带”上，随后会使用该“条带”通过反向模式微分（reverse mode differentiation）计算梯度。

“记录”过程：

x = tf.Variable(3.0)
with tf.GradientTape() as tape:
    y = x ** 2

记录一些运算后，使用 GradientTape.gradient(target, sources) 计算某个目标（通常为损失）相对于某个源（通常为模型参数）的梯度：

dy_dx = tape.gradient(y, x)
dy_dx.numpy()
"""
6.0
"""

上面这个例子使用了一个标量，tf.GradientTape 在任何张量上都可以运行：

w = tf.Variable(tf.random.normal((3, 2)), name='w')
b = tf.Variable(tf.zeros(2, dtype=tf.float32), name='b')
x = [[1., 2., 3.]]

with tf.GradientTape(persistent=True) as tape:
    y = x @ w + b
    loss = tf.reduce_mean(y**2)

要获得 loss 相对于两个变量的梯度，可以将这两个变量同时作为 gradient 方法的源传递。梯度带在关于源的传递方式上非常灵活，可以接受列表或字典的任何嵌套组合，并以相同的方式返回梯度结构。

相对于每个源的梯度具有源的形状：

[dl_dw, dl_db] = tape.gradient(loss, [w, b])
print(w.shape)
print(dl_dw.shape)
"""
(3, 2)
(3, 2)
"""

源处也可以传入变量字典：

source = {
    'w': w,
    'b': b
}

grad = tape.gradient(loss, source)
grad['b']
"""
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([-0.85382605, -4.2623644 ], dtype=float32)>
"""

Gradients with respect to a model

通常会将 tf.Variables 收集到 tf.Module 或其子类之一（如 layers.Layer、keras.Model）中，用于 checkpoint 或者导出。

在大多数情况下，我们需要计算相对于模型的可训练变量的梯度。由于 tf.Module 的所有子类都在 Module.trainable_variables 属性中聚合其变量，梯度的计算也非常简单：

layer = tf.keras.layers.Dense(2, activation='relu')
x = tf.constant([[1., 2., 3.]])

with tf.GradientTape() as tape:
    y = layer(x)
    loss = tf.reduce_mean(y**2)

grad = tape.gradient(loss, layer.trainable_variables)

for var, gra in zip(layer.trainable_variables, grad):
    print(f'{var.name}, shape: {gra.shape}')
"""
dense/kernel:0, shape: (3, 2)
dense/bias:0, shape: (2,)
"""

Controlling what the tape watches

默认情况下 TensorFlow 会在访问可训练的 tf.Variable 后记录所有运算。以下示例无法计算梯度，因为默认情况下 tf.Tensor 不被 tf.GradientTape 监视，或者将 tf.Variable 的 trainable 属性设置为 False：

# A trainable variable
x0 = tf.Variable(3.0, name='x0')
# Not trainable
x1 = tf.Variable(3.0, name='x1', trainable=False)
# Not a Variable: A variable + tensor returns a tensor.
x2 = tf.Variable(2.0, name='x2') + 1.0
# Not a variable
x3 = tf.constant(3.0, name='x3')

with tf.GradientTape() as tape:
    y = (x0**2) + (x1**2) + (x2**2)

grad = tape.gradient(y, [x0, x1, x2, x3])

for g in grad:
    print(g)
"""
tf.Tensor(6.0, shape=(), dtype=float32)
None
None
None
"""

要想记录相对于 tf.Tensor 的梯度，我们需要调用 GradientTape.watch(x)：

x = tf.constant(3.0)
with tf.GradientTape() as tape:
    tape.watch(x)
    y = x ** 2

dy_dx = tape.gradient(y, x)
dy_dx.numpy()

Intermediate results

我们也可以得到在 tf.GradientTape 中计算的中间值的梯度：

x = tf.constant(3.0)

with tf.GradientTape() as tape:
    tape.watch(x)
    y = x * x
    z = y * y

# dz_dy = 2 * y and y = x ** 2 = 9
print(tape.gradient(z, y).numpy())

默认情况下，只要调用 GradientTape.gradient 方法，就会释放 GradientTape 保存的资源。要在同一计算中计算多个梯度，可以设置 persistent=True。这样一来，当梯度带对象作为垃圾回收时，随着资源的释放，可以对 gradient 方法进行多次调用。例如：

x = tf.constant([1, 3.0])
with tf.GradientTape(persistent=True) as tape:
    tape.watch(x)
    y = x * x
    z = y * y

print(tape.gradient(z, x).numpy())  # [4.0, 108.0] (4 * x**3 at x = [1.0, 3.0])
print(tape.gradient(y, x).numpy())  # [2.0, 6.0] (2 * x at x = [1.0, 3.0])
"""
[  4. 108.]
[2. 6.]
"""

Gradients of non-scalar targets

梯度从根本上说是对标量的运算。对于计算多个目标的梯度，下面的例子中计算的是每个目标的梯度总和：

x = tf.Variable(2.0)
with tf.GradientTape() as tape:
    y0 = x**2
    y1 = 1 / x

print(tape.gradient({'y0': y0, 'y1': y1}, x).numpy())
"""
3.75
"""

如果目标不是标量，则计算总和的梯度：

x = tf.Variable(2.)

with tf.GradientTape() as tape:
    y = x * [3., 4.]

print(tape.gradient(y, x).numpy())
"""
7.0
"""

对每个条目都需要单独的梯度涉及到雅可比矩阵。在某些情况下，可以跳过雅可比矩阵。对于逐元素计算，总和的梯度给出了每个元素相对于其输入元素的导数，因为每个元素都是独立的：

x = tf.linspace(-10.0, 10.0, 200+1)

with tf.GradientTape() as tape:
    tape.watch(x)
    y = tf.nn.sigmoid(x)

dy_dx = tape.gradient(y, x)

plt.plot(x, y, label='y')
plt.plot(x, dy_dx, label='dy/dx')
plt.legend()
_ = plt.xlabel('x')

在这里插入图片描述

Cases where gradients returns None

当目标未连接到源时，gradient 将返回 None：

x = tf.Variable(2.)
y = tf.Variable(3.)

with tf.GradientTape() as tape:
    z = y * y
print(tape.gradient(z, x))
"""
None
"""

我们还可以通过几种不太明显的方式将梯度断开：

使用张量替换变量

x = tf.Variable(2.0)

for epoch in range(2):
    with tf.GradientTape() as tape:
        y = x+1

    print(type(x).__name__, ":", tape.gradient(y, x))
    x = x + 1   # This should be `x.assign_add(1)`
"""
ResourceVariable : tf.Tensor(1.0, shape=(), dtype=float32)
EagerTensor : None
"""

在 TensorFlow 之外进行了计算

如果计算退出 TensorFlow, 梯度带将无法记录梯度路径：

x = tf.Variable([[1.0, 2.0],
                 [3.0, 4.0]], dtype=tf.float32)

with tf.GradientTape() as tape:
    x2 = x ** 2

    # This step is calculated with NumPy
    y = np.mean(x2, axis=0)

    # Like most ops, reduce_mean will cast the NumPy array to a constant tensor
    # using `tf.convert_to_tensor`.
    y = tf.reduce_mean(y, axis=0)

print(tape.gradient(y, x))
"""
None
"""

通过整数或字符串获取梯度

整数和字符串不可微分。如果计算路径使用这些数据类型，则不会出现梯度。

x = tf.constant(10)

with tf.GradientTape() as tape:
    tape.watch(x)
    y = x * x

print(tape.gradient(y, x))
"""
None
"""

References

TensorFlow 官方网站，https://tensorflow.google.cn/guide/autodiff.

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.coloradmin.cn/o/160982.html

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈，一经查实，立即删除！

TensorFlow 基础（三）梯度和自动微分

文章目录

Computing gradients

Gradient tapes

Gradients with respect to a model

Controlling what the tape watches

Intermediate results

Gradients of non-scalar targets

Cases where gradients returns None

References

相关文章

【自学Python】Python三目运算符

千锋教育嵌入式物联网教程之系统编程篇学习-02

CPU 运行时的硬件环境详解

TCP/IP协议族之TCP、UDP协议详解（小白也能看懂）

常用的代码命名方法

Mysql 基础-持续更新

IPv6路由协议实验配置（ospfv3、isis-ipv6、bgp4+）

Android时间与服务器同步方案

mysql navicat函数_Navicat for MySQL函数高级属性

【C++】vector （vector的介绍及使用）

一个没有混进大厂的普通程序员，10年真实收入变化

2021年亚太杯APMCM数学建模大赛A题图像边缘分析与应用求解全过程文档及程序

Vue 常用内置指令

C#构建Web服务项目实战（二）

由浅入深地学习指针（学习指针必看）

【基于机械臂触觉伺服的物体操控研究】UR5e运动学建模及代码实现

【每日一题】【LeetCode】【第十二天】区域和检索 - 数组不可变

VScode远程调试深度学习debug

async-excel整合站内信通知用户体验感满满

完全二叉树与堆（包含STL堆的用法）