Affine Transformations仿射变换

news2025/4/13 3:13:39

在这里插入图片描述

什么是仿射变换

仿射变换（Affine Transformation）是数学和计算机图形学中的一种线性变换，它包括了平移、旋转、缩放、剪切等操作。仿射变换保留了几何图形的“仿射性质”，即平行线在变换后仍然平行，线性组合在变换后仍然是线性组合，并且保持点的相对顺序和比例关系，但不一定保持角度和距离。

在二维空间中，仿射变换可以用一个 $\times 2$ 的矩阵和一个 $\times 1$ 的平移向量来表示。具体来说，如果我们有一个点 $(x, y)$ ，其变换后的新位置 $(x^{'}, y^{'})$

可以通过以下公式得到：

在这里插入图片描述

其中，矩阵 $\begin{pmatrix} a & b \\ c & d \end{pmatrix}$ 描述了旋转、缩放和剪切，向量 $\begin{pmatrix} e \\ f \end{pmatrix}$ 描述了平移。

在更高维的空间中，仿射变换的概念类似。对于三维空间，变换可以表示为一个 $\times 3$ 的矩阵加上一个 $\times 1$ 的平移向量。

仿射变换广泛应用于计算机视觉、图像处理、计算机图形学、地理信息系统（GIS）、机器人学等领域。

如何进行仿射变换

我们可以通过对某些矩阵进行点积来执行对任何图像的转换。在下面，您可以看到一些使用的转换和矩阵。

在这里插入图片描述
下面是使用矩阵对图像进行仿射变换的函数。

def warpAffine(image, M, output_shape):
    rows, cols, *_ = image.shape
    out_rows, out_cols, *_ = output_shape

    output = np.zeros(output_shape, dtype=image.dtype)

    for out_row in range(out_rows):
        for out_col in range(out_cols):
            # Calculate the corresponding pixel coordinates in the input image
            in_col, in_row, _ = np.dot(M, [out_col, out_row, 1]).astype(int)

            # Check if the pixel coordinates are within the bounds of the input image
            if 0 <= in_row < rows and 0 <= in_col < cols:
                output[out_row, out_col, :] = image[in_row, in_col, :]

    return output

if __name__ == '__main__':
    img = cv2.imread('./panda.jpg')
    th = np.radians(45)
    M = np.float32([[np.cos(th), np.sin(th), -400], [-np.sin(th), np.cos(th), 250], [0, 0, 0]])
    shape = (img.shape[0] + 400, img.shape[1] + 400, img.shape[2])
    res = warpAffine(img, M, shape)

    plt.subplot(121)
    plt.title("Original")
    plt.imshow(img)
    plt.subplot(122)
    plt.title("Transformed")
    plt.imshow(res)
    plt.tight_layout()
    plt.show()

在这里插入图片描述
好吧，这样做很好，但如果我们不知道使用什么矩阵，又如何进行转换呢？

如果您可以通过给出三个 (x, y) 点（如 src 和 dst）来表达您的转换，那么我们可以轻松完成该转换。

如果您的输入如下所示，

在这里插入图片描述
你想要解的方程如下所示，

在这里插入图片描述
由于 SRC 是方阵，因此我们可以通过将 inv(SRC) 与 DST 进行矩阵乘法来计算矩阵 M 。

下面是 python 函数。

def getAffineTransform(src, dst):
    src = np.array([[x, y, 1] for (x, y) in src])

    M = np.linalg.inv(src) @ dst
    ''' or '''
    # M = np.linalg.solve(src, dst)

    return M.T

if __name__ == '__main__':
    def r():
        return (random.random() - 0.5) * .3 * cols

    _from = np.float32([[0, 0], [cols, rows], [0, rows]])
    _to = np.float32([[r(), r()], [cols + r(), rows + r()], [r(), rows + r()]])

    M = getAffineTransform(_from, _to)
    M = np.vstack([M, [0, 0, 0]])
    dst = warpAffine(img, M, img.shape)

    plt.figure(figsize=(20, 10))
    plt.subplot(121)
    plt.title("Original")
    plt.imshow(img)
    plt.subplot(122)
    plt.title("Transformed")
    plt.imshow(dst)
    plt.tight_layout()
    plt.show()

在这里插入图片描述
好了，现在我们知道如何变换了，但是我们如何反转变换呢？

我们可以使用仿射矩阵的逆对图像执行 warpAffine，请记住，只有当变换后的图像具有原始形式的所有数据时，我们才能将图像反转为精确的原始形式。

下面是 invertAffineTransform 的代码块，

def invertAffineTransform(M):
    # Extract the rotation and translation components of the affine matrix
    R = M[:2, :2]
    t = M[:2, 2]

    # Compute the inverse of the rotation matrix
    R_inv = np.linalg.inv(R)

    # Compute the inverse affine transformation matrix
    M_inv = np.zeros((2, 3), dtype=np.float32)
    M_inv[:2, :2] = R_inv
    M_inv[:2, 2] = -np.dot(R_inv, t)

    return M_inv

if __name__ == '__main__':
    M_inv = invertAffineTransform(M)
    M_inv = np.vstack([M_inv, [0, 0, 0]])

    res_inv = warpAffine(res, M_inv, img.shape)

    plt.figure(figsize=(20, 10))
    plt.subplot(121)
    plt.title("Transformed")
    plt.imshow(res)
    plt.subplot(122)
    plt.title("InvTransformed")
    plt.imshow(res_inv)
    plt.tight_layout()
    plt.show()

在这里插入图片描述

结论

我希望您现在对仿射变换及其背后的数学有了清晰的了解。

当您处理现实世界的项目时，您不必做所有这些努力，您可以使用 OpenCV 等库来满足您的需求。在使用 OpenCV 的 Python 中，您可以执行如下所示的解释算法：

img = cv2.imread('<path-to-img>')

th = np.radians(20)
M = np.float32([[np.cos(th), np.sin(th), 100], [-np.sin(th), np.cos(th), 250]])
shape = (img.shape[0] + 400, img.shape[1] + 400)
rows, cols, _ = img.shape

def r():
    return (random.random() - 0.5) * .3 * cols


res = cv2.warpAffine(img, M, shape)


_from = np.float32([[0, 0], [cols, rows], [0, rows]])
_to = np.float32([[r(), r()], [cols + r(), rows + r()], [r(), rows + r()]])

M = cv2.getAffineTransform(_from, _to)
dst = cv2.warpAffine(img, M, shape)

M_inv = cv2.invertAffineTransform(M)
res_inv = cv2.warpAffine(res, M_inv, (img.shape[0], img.shape[1]))