计算机视觉——基于傅里叶幅度谱文档倾斜度检测与校正

news2025/2/22 4:06:40

概述

在计算机视觉领域，处理文档数据时，OCR算法的性能往往会受到文档的倾斜度影响。如果文档在输入到模型之前没有经过恰当的校正，模型就无法期待模型能够提供准确的预测结果，或者模型预测的精度会降低。例如，在信息提取系统中，如果向OCR模型提供了倾斜的图像，模型可能无法准确地识别出文本内容的同时，文本的对齐方式也可能因此而丢失。特别在一些包含了表格检测文档，如果在进行表格检测之前没有对图像的倾斜度进行校正，那么模型可能无法准确地预测出表格的边界和角落。

文档的倾斜度是指在扫描或数字化捕获过程中，文档图像出现的倾斜或斜度。这种情况通常因为图像获取时的环境或者设备的原因。在文档处理系统中，进行倾斜估计是一项至关重要的工作，尤其对于扫描得到的文档图像而言，因为准确的倾斜估计直接影响到后续处理步骤的效果。
在这里插入图片描述

文档倾斜校正

主要的方法是通过在2D离散傅里叶幅度谱上应用自适应径向投影来提取给定文档图像的主要倾斜角度。这一过程首先通过二维离散傅里叶变换（DFT）将文档图像从空间域转换到频率域，生成一个频谱，其中每个点的强度代表了图像中特定频率的幅度。这一变换揭示了图像倾斜度的关键频率成分。

接着，对傅里叶幅度谱进行分析，因为在幅度谱中，文档的倾斜度表现为主导方向。通过识别这些方向，可以估计出倾斜角度。

自适应径向投影是这个方法的核心创新点，它包括两个独立的步骤：

初始径向投影：这一步用于估计初步的倾斜角度，通过在傅里叶谱的中心发出的径向线上投影幅度来实现。得到最高投影值的径向线指示了图像中文本的主要方向，从而关联到倾斜角度。
校正投影：这一步骤对初步估计进行细化，考虑到初步投影可能受到文本对齐或图像中非文本元素等因素的影响。校正投影会适应这些因素，以提供更精确的倾斜角度估计。

在通过径向投影识别出主导方向后，计算出相应的倾斜角度。这个角度指示了需要旋转的角度，以便将图像中的文本与水平或垂直轴对齐，从而有效地校正图像的倾斜。

为了提高方法的准确性，还包括了一些额外的步骤，比如考虑傅里叶谱中的直流分量（DC）和低频成分，这对于处理不同类型文档图像非常重要。

具体实践与算法推导可看论文《Adaptive Radial Projection on Fourier Magnitude Spectrum for Document Image Skew Estimation》。

代码实现

首先，使用_get_fft_magnitude()函数计算快速傅里叶变换的幅度，如下所示：

def _ensure_gray(image):
    try:
        image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    except cv2.error:
        pass
    return image


def _ensure_optimal_square(image):
    assert image is not None, image
    nw = nh = cv2.getOptimalDFTSize(max(image.shape[:2]))
    output_image = cv2.copyMakeBorder(
        src=image,
        top=0,
        bottom=nh - image.shape[0],
        left=0,
        right=nw - image.shape[1],
        borderType=cv2.BORDER_CONSTANT,
        value=255,
    )
    return output_image


def _get_fft_magnitude(image):
    gray = _ensure_gray(image)
    opt_gray = _ensure_optimal_square(gray)

    # thresh
    opt_gray = cv2.adaptiveThreshold(
        ~opt_gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 15, -10
    )

    # perform fft
    dft = np.fft.fft2(opt_gray)
    shifted_dft = np.fft.fftshift(dft)

    # get the magnitude (module)
    magnitude = np.abs(shifted_dft)
    return magnitude

然后使用径向投影计算倾斜角度，该投影沿着各种径向线投影傅里叶谱的幅度：

def _get_angle_radial_projection(m, angle_max=None, num=None, W=None):
    """Get angle via radial projection.

    Arguments:
    ------------
    :param angle_max : 
    :param num: number of angles to generate between 1 degree
    :param w: 
    :return:
    """
    assert m.shape[0] == m.shape[1]
    r = c = m.shape[0] // 2

    if angle_max is None:
        pass

    if num is None:
        num = 20

    tr = np.linspace(-1 * angle_max, angle_max, int(angle_max * num * 2)) / 180 * np.pi
    profile_arr = tr.copy()

    def f(t):
        _f = np.vectorize(
            lambda x: m[c + int(x * np.cos(t)), c + int(-1 * x * np.sin(t))]
        )
        _l = _f(range(0, r))
        val_init = np.sum(_l)
        return val_init

    vf = np.vectorize(f)
    li = vf(profile_arr)

    a = tr[np.argmax(li)] / np.pi * 180

    if a == -1 * angle_max:
        return 0
    return a

一旦得到倾斜角度，将使用该倾斜角度来校正上述图像的倾斜度。

def correct_text_skewness(image):
    """
    Method to rotate image by n degree
    :param image:
    :return:
    """
    # cv2_imshow(image)
    h, w, c = image.shape
    x_center, y_center = (w // 2, h // 2)

    # Find angle to rotate image
    rotation_angle = get_skewed_angle(image)
    print(f"[INFO]: Rotation angle is {rotation_angle}")

    # Rotate the image by given n degree around the center of the image
    M = cv2.getRotationMatrix2D((x_center, y_center), rotation_angle, 1.0)
    borderValue = (255, 255, 255)

    rotated_image = cv2.warpAffine(image, M, (w, h), flags=cv2.INTER_CUBIC, borderValue=borderValue)
    return rotated_image
    ...