深度学习图像处理02：Tensor数据类型

news2025/4/18 16:29:01

上一讲深度学习图像处理01：图像的本质，我们了解到图像处理的本质是对矩阵的操作。这一讲，我们讲介绍深度学习图像处理的基本数据类型：Tensor类型。

在深度学习领域，Tensor是一种核心的数据结构，用于表示和处理数据。本文将详细介绍Tensor数据类型，包括其定义、如何创建和操作Tensors，特别是在图像处理和深度学习中的应用。我们将通过实例、数学公式和Python代码来深入理解Tensor。

1. 什么是Tensor?

Tensor，简而言之，是一个多维数组，是标量、向量和矩阵的高维推广。在深度学习中，Tensors是算法的基本构建块，用于表示和处理数据，如图像、声音或文本。

0维Tensor：标量（Scalar），例如一个数字。
1维Tensor：向量（Vector），例如一个数字列表。
2维Tensor：矩阵（Matrix），例如二维数组。
n维Tensor：更高维度的数组。

2. Tensor数据基本操作

2.1 Tensor的创建

以下是使用PyTorch创建Tensors的一些基本例子：

import torch

# 创建一个空的Tensor
x = torch.empty(5, 3)
print("Empty Tensor:\n", x)

# 创建一个随机初始化的Tensor
x = torch.rand(5, 3)
print("Random Tensor:\n", x)

# 创建一个全为0，类型为long的Tensor
x = torch.zeros(5, 3, dtype=torch.long)
print("Zero Tensor:\n", x)

# 创建一个直接从数据的Tensor
x = torch.tensor([5.5, 3])
print("Tensor from data:\n", x)

运行结果：

Empty Tensor:
tensor([[-2.2140e-26, 8.0995e-43, -2.2140e-26],
[ 8.0995e-43, -2.2140e-26, 8.0995e-43],
[-2.2140e-26, 8.0995e-43, -2.2140e-26],
[ 8.0995e-43, -2.2140e-26, 8.0995e-43],
[-2.2140e-26, 8.0995e-43, -2.2140e-26]])

Random Tensor:
tensor([[0.0027, 0.5500, 0.1749],
[0.9315, 0.9860, 0.2204],
[0.9672, 0.5092, 0.0170],
[0.1733, 0.3858, 0.8220],
[0.0176, 0.2597, 0.3035]])

Zero Tensor:
tensor([[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])

Tensor from data:
tensor([5.5000, 3.0000])

2.2 模拟生成二维矩阵

二维矩阵是处理图像时常用的数据结构，每个元素可以代表图像中的一个像素值。以下是如何使用PyTorch创建一个二维矩阵：

import torch
# 创建一个5x3的随机二维矩阵
matrix = torch.rand(5, 3)
print("Random Matrix:\n", matrix)

运行结果：

Random Matrix:
tensor([[0.5890, 0.0223, 0.0294],
[0.9720, 0.1617, 0.6436],
[0.0723, 0.4613, 0.3779],
[0.2925, 0.7364, 0.7298],
[0.0533, 0.1137, 0.2888]])

2. Tensor的操作

在深度学习和图像处理中，主要是对Tensor数据进行操作。这些操作包括但不限于算术运算、形状变换、索引和切片等，它们使得我们能够有效地处理和变换数据以适应不同的算法需求。

2.1 算术运算

Tensor支持各种算术运算，这些运算可以是元素级别的，也可以是矩阵乘法等更复杂的运算。

（1）元素级别的加法运算

假设有两个相同形状的矩阵 A 和 B，它们的形状为 m×n。元素级别的加法可以表示为： $eq?C_%7Bij%7D%20%3D%20A_%7Bij%7D%20+%20B_%7Bij%7D$

（2）矩阵乘法

假设有两个矩阵 A 和 B，其中 A 的形状为 m×p，B 的形状为 p×n。矩阵乘法可以表示为： $eq?C_%7Bij%7D%20%3D%20%5Csum_%7Bk%3D1%7D%5E%7Bn%7D%20A_%7Bik%7DB_%7Bkj%7D$

下面代码分别战术元素级别的加法和矩阵乘法

import torch

# 元素级别的加法
x = torch.tensor([1, 2])
y = torch.tensor([3, 4])
print("Element-wise Sum:\n", x + y)

# 矩阵乘法
x = torch.tensor([[1, 2], [3, 4]])
y = torch.tensor([[5, 6], [7, 8]])
print("Matrix Multiplication:\n", torch.mm(x, y))

运行结果：

Broadcasted Addition:
tensor([[2, 3, 4],
[3, 4, 5],
[4, 5, 6]])

Element-wise Sum:
tensor([4, 6])

Matrix Multiplication:
tensor([[19, 22],
[43, 50]])

2.2 维度操作

在深度学习中，经常需要对Tensor的维度进行操作，包括增加维度、减少维度、或是重排维度顺序，这对于匹配不同层期望的输入输出形状非常关键。

代码首先生成一个二维矩阵，然后依次进行增加一个维度，再减少一个维度，最后重排维度三个操作。代码如下：

import torch
# 生成一个二维矩阵
x = torch.tensor([[1, 2], [3, 4]])
print(f'x:\n{x}')
print(f'Tensor x shape:{x.shape}')
print(f'*'*50)

# 增加一个维度
x_unsqueeze = x.unsqueeze(0)  # 在第0维增加
print(f'x_unsqueeze:\n{x_unsqueeze}')
print("Tensor x_unsqueeze shape:", x_unsqueeze.shape)
print(f'*'*50)

# 减少一个维度
x_squeeze = x_unsqueeze.squeeze(0)  # 减少第0维
print(f'x_squeeze:\n{x_squeeze}')
print("Tensor x_squeeze shape:\n", x_squeeze.shape)
print(f'*'*50)

# 重排维度
x_permute = x.unsqueeze(0).permute(2, 1, 0)  # 将维度重排
print(f'Tensor x_permute:\n{x_permute}')
print("Tensor x_permute shape:\n", x_permute.shape)
print(f'*'*50)

运行结果：

x:
tensor([[1, 2],
[3, 4]])
Tensor x shape:torch.Size([2, 2])
**************************************************
x_unsqueeze:
tensor([[[1, 2],
[3, 4]]])
Tensor x_unsqueeze shape: torch.Size([1, 2, 2])
**************************************************
x_squeeze:
tensor([[1, 2],
[3, 4]])
Tensor x_squeeze shape:
torch.Size([2, 2])
**************************************************
Tensor x_permute:
tensor([[[1],
[3]],

[[2],
[4]]])
Tensor x_permute shape:
torch.Size([2, 2, 1])
**************************************************

2.3 Tensor的合并与分割

在进行数据预处理或模型输出后处理时，经常需要合并或分割Tensor。合并可以将不同的数据集拼接成更大的数据集，分割则可以将数据集分成更小的批次进行处理。

以下代码，对Tensor的合并于分割进行演示

import torch
# 合并Tensors
x = torch.tensor([[1, 2], [3, 4]])
y = torch.tensor([[5, 6], [7, 8]])
z = torch.cat([x, y], dim=0)  # 沿着行合并, dim=1是沿着列合并
print("Tensors after concatenation:\n", z)
print('*'*50)

# 分割Tensor
x = torch.arange(1, 10)
x_split = torch.split(x, 3)  # 每个部分包含3个元素
print("Tensors after split:")
for tensor in x_split:
    print(tensor)
print('*'*50)

运行结果：

Tensors after concatenation:
tensor([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
**************************************************
Tensors after split:
tensor([1, 2, 3])
tensor([4, 5, 6])
tensor([7, 8, 9])
**************************************************

2.4 广播

广播（Broadcasting）是Tensor数据操作一种的机制，允许不同形状的Tensors进行数学运算。在某些条件下，较小的Tensor会自动扩展到较大的Tensor的大小。

x = torch.tensor([1, 2, 3])
y = torch.tensor([[1], [2], [3]])
# 通过广播，x会自动扩展成[[1, 2, 3], [1, 2, 3], [1, 2, 3]]以匹配y的形状
print("Broadcasted Addition:\n", x + y)

运行结果：