使用 pytesseract 进行 OCR 识别：以固定区域经纬度提取为例

news2026/2/14 23:35:19

引言

在智能交通、地图定位等应用场景中，经常会遇到需要从图像中提取经纬度信息的需求。本篇文章将介绍如何利用 Python 的 pytesseract 库结合 PIL 对图像进行预处理，通过固定区域裁剪，来有效地识别出图像上显示的经纬度信息。

1. OCR 与 pytesseract 简介

OCR（Optical Character Recognition，光学字符识别）技术能够将图片中的文字信息转换成可编辑的文本。Tesseract 是一款开源的 OCR 引擎，功能强大且准确率较高；而 pytesseract 则是其 Python 封装，可以方便地在 Python 项目中调用 Tesseract 进行识别。

在我们的示例中，我们主要针对图像上固定位置的经纬度区域进行处理与识别。由于经纬度中只包含数字、小数点、°、N、S、E、W 等字符，我们可以通过设置 白名单 限制识别字符，从而提高识别准确率

2.示例代码

下面给出完整示例代码，并附带详细的注释说明每一步的实现逻辑：

import time
import pytesseract
from PIL import Image, ImageFilter, ImageEnhance

class OCRReader:
    def __init__(self, center_x, center_y, width, height, sharpness=2.0, contrast=2.0, blur_radius=1):
        """
        初始化 OCRReader 类，使用中心点和宽高设置裁剪区域的参数，并配置图像预处理的超参数。
        
        参数：
          center_x (int): 经度/纬度信息区域中心点的 x 坐标（从左向右）
          center_y (int): 经度/纬度信息区域中心点的 y 坐标（从上向下）
          width (int): 裁剪区域的宽度
          height (int): 裁剪区域的高度
          sharpness (float): 锐化处理的增强系数，数字越大效果越明显
          contrast (float): 对比度增强系数，数字越大表示对比度越明显
          blur_radius (float): 高斯模糊的半径，主要用于图像降噪
        """
        self.center_x = center_x
        self.center_y = center_y
        self.width = width
        self.height = height
        self.sharpness = sharpness
        self.contrast = contrast
        self.blur_radius = blur_radius
        # 对于经纬度，白名单中仅包含数字、°、小数点以及方向字符
        self.whitelist = "0123456789°.NSEW"

    def preprocess_image(self, img):
        """
        对裁剪后的图像进行预处理：包括图像的锐化、对比度增强以及高斯模糊降噪。
        
        参数：
          img (Image): PIL 图像对象
        
        返回：
          Image: 预处理后的图像对象
        """
        # 锐化处理，增强图像细节
        sharpener = ImageEnhance.Sharpness(img)
        img = sharpener.enhance(self.sharpness)

        # 增强对比度，使文字更明显
        enhancer = ImageEnhance.Contrast(img)
        img = enhancer.enhance(self.contrast)

        # 应用高斯模糊降噪
        if self.blur_radius > 0:
            img = img.filter(ImageFilter.GaussianBlur(self.blur_radius))

        return img

    def read_coordinates(self, image_path):
        """
        从给定图像文件中提取经纬度信息。
        
        参数：
          image_path (str): 图像文件的路径
        
        返回：
          str: OCR 识别出的文本
        """
        # 加载图像
        img = Image.open(image_path)

        # 如果图像带有透明度，则将其转换为 RGB 模式（填充背景为白色）
        if img.mode == 'RGBA':
            background = Image.new('RGB', img.size, (255, 255, 255))
            background.paste(img, mask=img.split()[3])
            img = background
        elif img.mode == 'LA':
            background = Image.new('L', img.size, 255)
            background.paste(img, mask=img.split()[1])
            img = background.convert('RGB')

        # 根据中心点坐标和宽高，计算出裁剪区域的左上角和右下角坐标
        left = self.center_x - self.width // 2
        top = self.center_y - self.height // 2
        right = self.center_x + self.width // 2
        bottom = self.center_y + self.height // 2

        # 裁剪图像得到经纬度显示区域
        cropped_img = img.crop((left, top, right, bottom))
        cropped_img.save('sub_img.jpg')  # 保存裁剪后的图像，便于调试

        # 对裁剪后的图像进行预处理
        processed_img = self.preprocess_image(cropped_img)
        processed_img.save('processed_sub_img.jpg')  # 保存预处理后的图像，便于调试

        # 配置 Tesseract 的识别参数：
        # --psm 6 表示将图像看作单一文本块
        # tessedit_char_whitelist 限定识别的字符集
        custom_config = f'--psm 6 -c tessedit_char_whitelist={self.whitelist}'
        result = pytesseract.image_to_string(processed_img, config=custom_config, timeout=1)
        return result

# 示例使用
if __name__ == '__main__':
    ocr_reader = OCRReader(center_x=1440, center_y=802, width=204, height=20)
    t1 = time.time()
    result = ocr_reader.read_coordinates('./ocr_test.png')
    print("\n识别结果:", result)
    print(f"Time: {time.time() - t1}")