PaddleOCR训练自己的私有数据集（包括标注、制作数据集、训练及应用）

一、制作数据集

1、进入到PaddleOCR-releas-2.7目录

2、首先启用PPOCRLabel：在终端激活环境

3、接着点击左下角的自动标注

4、确认完成后点击左上角

5、新建gen_ocr_train_val_test.py

二、训练文字检测模型

1、模型下载

2.、配置ppocr检测模型文件

3、模型开始训练

4.、测试训练模型

三、训练文字识别模型

1、修改识别模型配置文件

2、模型训练

3、模型测试

四、转换成推理模型

1、在anaconda终端中输入指令进行测试

2.用predict_system.py进行验证

五、推理模型部署在RK3588上

记录自己使用的过程

前提：

已经完成电脑的基础配置包括：

cuda、cudnn、pytorch、conda、PPOCRLabel等

没有安装的可以参考：ubuntu22.04安装PPOCRLabel-CSDN博客

我之前写的一篇文章：PaddleOCR环境搭建、模型训练、推理、部署全流程（Ubuntu系统）_随记1-CSDN博客

这次主要改进并优化之前写的一篇内容

一、制作数据集

1、进入到PaddleOCR-releas-2.7目录

新建一个data文件夹、data文件夹下放置存放要标注的图片，名字命名images

在data下新建test_images用来存放测试图片

2、首先启用PPOCRLabel：在终端激活环境

cobda activate label4

PPOCRLabel --lang ch --kie True

接着点击左上角文件，把下面三个都选上

自动导出标记结果、自动重新识别、自动保存未提交变更

3、接着点击左下角的自动标注

自动标注完成点击

对每一个图片内容进行检测并改正;完成之后点击右下角确认

4、确认完成后点击左上角

导出标记结果、导出识别结果

完成之后可以发现/home/sxj/ppocr-1/data/images下多了这几个文件

5、新建gen_ocr_train_val_test.py

功能描述：分别划分检测和识别的训练集、验证集、测试集

# coding:utf8
import os
import shutil
import random
import argparse


# 删除划分的训练集、验证集、测试集文件夹，重新创建一个空的文件夹
def isCreateOrDeleteFolder(path, flag):
    flagPath = os.path.join(path, flag)

    if os.path.exists(flagPath):
        shutil.rmtree(flagPath)

    os.makedirs(flagPath)
    flagAbsPath = os.path.abspath(flagPath)
    return flagAbsPath


def splitTrainVal(root, absTrainRootPath, absValRootPath, absTestRootPath, trainTxt, valTxt, testTxt, flag):
    # 按照指定的比例划分训练集、验证集、测试集
    dataAbsPath = os.path.abspath(root)

    if flag == "det":
        labelFilePath = os.path.join(dataAbsPath, args.detLabelFileName)
    elif flag == "rec":
        labelFilePath = os.path.join(dataAbsPath, args.recLabelFileName)

    labelFileRead = open(labelFilePath, "r", encoding="UTF-8")
    labelFileContent = labelFileRead.readlines()
    random.shuffle(labelFileContent)
    labelRecordLen = len(labelFileContent)

    for index, labelRecordInfo in enumerate(labelFileContent):
        imageRelativePath = labelRecordInfo.split('\t')[0]
        imageLabel = labelRecordInfo.split('\t')[1]
        imageName = os.path.basename(imageRelativePath)

        if flag == "det":
            imagePath = os.path.join(dataAbsPath, imageName)
        elif flag == "rec":
            imagePath = os.path.join(dataAbsPath, "{}/{}".format(args.recImageDirName, imageName))

        # 按预设的比例划分训练集、验证集、测试集
        trainValTestRatio = args.trainValTestRatio.split(":")
        trainRatio = eval(trainValTestRatio[0]) / 10
        valRatio = trainRatio + eval(trainValTestRatio[1]) / 10
        curRatio = index / labelRecordLen

        if curRatio < trainRatio:
            imageCopyPath = os.path.join(absTrainRootPath, imageName)
            shutil.copy(imagePath, imageCopyPath)
            trainTxt.write("{}\t{}".format(imageCopyPath, imageLabel))
        elif curRatio >= trainRatio and curRatio < valRatio:
            imageCopyPath = os.path.join(absValRootPath, imageName)
            shutil.copy(imagePath, imageCopyPath)
            valTxt.write("{}\t{}".format(imageCopyPath, imageLabel))
        else:
            imageCopyPath = os.path.join(absTestRootPath, imageName)
            shutil.copy(imagePath, imageCopyPath)
            testTxt.write("{}\t{}".format(imageCopyPath, imageLabel))


# 删掉存在的文件
def removeFile(path):
    if os.path.exists(path):
        os.remove(path)


def genDetRecTrainVal(args):
    detAbsTrainRootPath = isCreateOrDeleteFolder(args.detRootPath, "train")
    detAbsValRootPath = isCreateOrDeleteFolder(args.detRootPath, "val")
    detAbsTestRootPath = isCreateOrDeleteFolder(args.detRootPath, "test")
    recAbsTrainRootPath = isCreateOrDeleteFolder(args.recRootPath, "train")
    recAbsValRootPath = isCreateOrDeleteFolder(args.recRootPath, "val")
    recAbsTestRootPath = isCreateOrDeleteFolder(args.recRootPath, "test")

    removeFile(os.path.join(args.detRootPath, "train.txt"))
    removeFile(os.path.join(args.detRootPath, "val.txt"))
    removeFile(os.path.join(args.detRootPath, "test.txt"))
    removeFile(os.path.join(args.recRootPath, "train.txt"))
    removeFile(os.path.join(args.recRootPath, "val.txt"))
    removeFile(os.path.join(args.recRootPath, "test.txt"))

    detTrainTxt = open(os.path.join(args.detRootPath, "train.txt"), "a", encoding="UTF-8")
    detValTxt = open(os.path.join(args.detRootPath, "val.txt"), "a", encoding="UTF-8")
    detTestTxt = open(os.path.join(args.detRootPath, "test.txt"), "a", encoding="UTF-8")
    recTrainTxt = open(os.path.join(args.recRootPath, "train.txt"), "a", encoding="UTF-8")
    recValTxt = open(os.path.join(args.recRootPath, "val.txt"), "a", encoding="UTF-8")
    recTestTxt = open(os.path.join(args.recRootPath, "test.txt"), "a", encoding="UTF-8")

    splitTrainVal(args.datasetRootPath, detAbsTrainRootPath, detAbsValRootPath, detAbsTestRootPath, detTrainTxt, detValTxt,
                  detTestTxt, "det")

    for root, dirs, files in os.walk(args.datasetRootPath):
        for dir in dirs:
            if dir == 'crop_img':
                splitTrainVal(root, recAbsTrainRootPath, recAbsValRootPath, recAbsTestRootPath, recTrainTxt, recValTxt,
                              recTestTxt, "rec")
            else:
                continue
        break



if __name__ == "__main__":
    # 功能描述：分别划分检测和识别的训练集、验证集、测试集
    # 说明：可以根据自己的路径和需求调整参数，图像数据往往多人合作分批标注，每一批图像数据放在一个文件夹内用PPOCRLabel进行标注，
    # 如此会有多个标注好的图像文件夹汇总并划分训练集、验证集、测试集的需求
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--trainValTestRatio",
        type=str,
        default="6:2:2",
        help="ratio of trainset:valset:testset")
    parser.add_argument(
        "--datasetRootPath",
        type=str,
        default="./data/",
        help="path to the dataset marked by ppocrlabel, E.g, dataset folder named 1,2,3..."
    )
    parser.add_argument(
        "--detRootPath",
        type=str,
        default="./data/det",
        help="the path where the divided detection dataset is placed")
    parser.add_argument(
        "--recRootPath",
        type=str,
        default="./data/rec",
        help="the path where the divided recognition dataset is placed"
    )
    parser.add_argument(
        "--detLabelFileName",
        type=str,
        default="Label.txt",
        help="the name of the detection annotation file")
    parser.add_argument(
        "--recLabelFileName",
        type=str,
        default="rec_gt.txt",
        help="the name of the recognition annotation file"
    )
    parser.add_argument(
        "--recImageDirName",
        type=str,
        default="crop_img",
        help="the name of the folder where the cropped recognition dataset is located"
    )
    args = parser.parse_args()
    genDetRecTrainVal(args)

运行：

python gen_ocr_train_val_test.py --trainValTestRatio 6:2:2 --datasetRootPath ./data/images

完成之后在data文件夹下多了：det、rec两个文件夹

二、训练文字检测模型

可使用的模型参考模型列表，ppocr版本这里PPOCR版本作为预训练模型：

（经常用放在这里）

1、模型下载

下载之后在PaddleOCR-release-2.7根目录下建立pretrain_models文件夹，并将训练模型解压至该文件夹下。如下图：

2.、配置ppocr检测模型文件

在configs / det / ch_ppocr_v2.0 /找到 ch_det_res18_db_v2.0.yml配置文件

我的ch_det_res18_db_v2.0.yml代码：

Global:
  use_gpu: true                # 是否用GPU，无改为false
  epoch_num: 50                # 训练迭代次数
  log_smooth_window: 20
  print_batch_step: 2          # 一次图片传输张数
  save_model_dir: ./output/ch_db_res18/   # 输出模型文件路径
  save_epoch_step: 50          # 训练迭代多少次保存一次训练模型
  # evaluation is run every 5000 iterations after the 4000th iteration
  eval_batch_step: [3000, 2000]
  cal_metric_during_train: False
  pretrained_model: ./pretrain_models/ch_PP-OCRv4_det_train/best_accuracy.pdparams # 刚下载好的训练模型路径
  checkpoints:
  save_inference_dir:
  use_visualdl: False
  infer_img: doc/imgs_en/img_10.jpg
  save_res_path: ./output/det_db/predicts_db.txt
 
Architecture:
  model_type: det
  algorithm: DB
  Transform:
  Backbone:
    name: ResNet_vd
    layers: 18
    disable_se: True
  Neck:
    name: DBFPN
    out_channels: 256
  Head:
    name: DBHead
    k: 50
 
Loss:
  name: DBLoss
  balance_loss: true
  main_loss_type: DiceLoss
  alpha: 5
  beta: 10
  ohem_ratio: 3
 
Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Cosine
    learning_rate: 0.001
    warmup_epoch: 2
  regularizer:
    name: 'L2'
    factor: 0
 
PostProcess:
  name: DBPostProcess
  thresh: 0.3
  box_thresh: 0.6
  max_candidates: 1000
  unclip_ratio: 1.5
 
Metric:
  name: DetMetric
  main_indicator: hmean
 
Train:
  dataset:
    name: SimpleDataSet
    data_dir: ./data/             # train_data路径
    label_file_list:
      - ./data/det/train.txt      # 数据集标签路径
    ratio_list: [1.0]
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - DetLabelEncode: # Class handling label
      - IaaAugment:
          augmenter_args:
            - { 'type': Fliplr, 'args': { 'p': 0.5 } }
            - { 'type': Affine, 'args': { 'rotate': [-10, 10] } }
            - { 'type': Resize, 'args': { 'size': [0.5, 3] } }
      - EastRandomCropData:
          size: [960, 960]
          max_tries: 50
          keep_ratio: true
      - MakeBorderMap:
          shrink_ratio: 0.4
          thresh_min: 0.3
          thresh_max: 0.7
      - MakeShrinkMap:
          shrink_ratio: 0.4
          min_text_size: 8
      - NormalizeImage:
          scale: 1./255.
          mean: [0.485, 0.456, 0.406]
          std: [0.229, 0.224, 0.225]
          order: 'hwc'
      - ToCHWImage:
      - KeepKeys:
          keep_keys: ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask'] # the order of the dataloader list
  loader:
    shuffle: True
    drop_last: False
    batch_size_per_card: 2
    num_workers: 2
 
Eval:
  dataset:
    name: SimpleDataSet
    data_dir: ./data/                   # train_data路径
    label_file_list:
      - ./data/det/val.txt              # 数据集中的评估标签
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - DetLabelEncode: # Class handling label
      - DetResizeForTest:
#           image_shape: [736, 1280]
      - NormalizeImage:
          scale: 1./255.
          mean: [0.485, 0.456, 0.406]
          std: [0.229, 0.224, 0.225]
          order: 'hwc'
      - ToCHWImage:
      - KeepKeys:
          keep_keys: ['image', 'shape', 'polys', 'ignore_tags']
  loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: 1 # must be 1
    num_workers: 2

3、模型开始训练

打开anaconda终端，激活环境进入到PaddleOCR-releas-2.7根目录下

输入以下指令开始模型训练：

python tools/train.py -c configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml

4.、测试训练模型

找到模型保存的路径：/output/ch_db_res18/

使用best_accuracy.pdparams进行我们的模型测试，没有说明训练次数少，用latest.pdparams模型测试

在anaconda终端中输入以下指令进行测试，其中Global.pretrained_model是训练好并且需要测试的模型，Global.infer_img为所要检测的图片路径：

python tools/infer_det.py -c configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml -o Global.pretrained_model=output/ch_db_res18/latest.pdparams Global.infer_img="./data/test_images/1.jpg"

检测模型完成，接下来进行识别模型训练和测试！！！

三、训练文字识别模型

1、修改识别模型配置文件

文字识别使用配置文件为ch_PP-OCRv3_rec.yml

在configs / rec / PP-OCRv3 /找到 ch_PP-OCRv3_rec.yml 配置文件

修改的地方和文字检测修改类似：

我自己ch_PP-OCRv3_rec.yml代码：

Global:
  debug: false
  use_gpu: true
  epoch_num: 50
  log_smooth_window: 20
  print_batch_step: 1
  save_model_dir: ./output/rec_ppocr_v3
  save_epoch_step: 15
  eval_batch_step: [3000, 2000]
  cal_metric_during_train: true
  pretrained_model: ./pretrain_models/ch_PP-OCRv4_rec_train/student.pdparams # 识别训练模型路径
  checkpoints:
  save_inference_dir:
  use_visualdl: false
  infer_img: doc/imgs_words/ch/word_1.jpg
  character_dict_path: ppocr/utils/ppocr_keys_v1.txt
  max_text_length: &max_text_length 25
  infer_mode: false
  use_space_char: true
  distributed: true
  save_res_path: ./output/rec/predicts_ppocrv4.txt
 
 
 
 
Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Cosine
    learning_rate: 0.001
    warmup_epoch: 5
  regularizer:
    name: L2
    factor: 3.0e-05
 
 
Architecture:
  model_type: rec
  algorithm: SVTR_LCNet
  Transform:
  Backbone:
    name: MobileNetV1Enhance
    scale: 0.5
    last_conv_stride: [1, 2]
    last_pool_type: avg
    last_pool_kernel_size: [2, 2]
  Head:
    name: MultiHead
    head_list:
      - CTCHead:
          Neck:
            name: svtr
            dims: 64
            depth: 2
            hidden_dims: 120
            use_guide: True
          Head:
            fc_decay: 0.00001
      - SARHead:
          enc_dim: 512
          max_text_length: *max_text_length
 
Loss:
  name: MultiLoss
  loss_config_list:
    - CTCLoss:
    - SARLoss:
 
PostProcess:  
  name: CTCLabelDecode
 
Metric:
  name: RecMetric
  main_indicator: acc
  ignore_space: False
 
Train:
  dataset:
    name: SimpleDataSet
    data_dir: ./train_data/
    ext_op_transform_idx: 1
    label_file_list:
    - ./data/rec/train.txt      # 识别训练数据集标签路径
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - RecConAug:
        prob: 0.5
        ext_data_num: 2
        image_shape: [48, 320, 3]
        max_text_length: *max_text_length
    - RecAug:
    - MultiLabelEncode:
    - RecResizeImg:
        image_shape: [3, 48, 320]
    - KeepKeys:
        keep_keys:
        - image
        - label_ctc
        - label_sar
        - length
        - valid_ratio
  loader:
    shuffle: true
    batch_size_per_card: 16
    drop_last: true
    num_workers: 8
Eval:
  dataset:
    name: SimpleDataSet
    data_dir: ./data/
    label_file_list:
    - ./data/rec/val.txt         # 识别数据集中的评估标签路径
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - MultiLabelEncode:
    - RecResizeImg:
        image_shape: [3, 48, 320]
    - KeepKeys:
        keep_keys:
        - image
        - label_ctc
        - label_sar
        - length
        - valid_ratio
  loader:
    shuffle: false
    drop_last: false
    batch_size_per_card: 16
    num_workers: 8

2、模型训练

打开anaconda终端，激活环境进入到PaddleOCR-releas-2.7根目录下。输入以下指令开始模型训练。

python tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml

训练完成。

3、模型测试

在anaconda终端中输入以下指令进行测试。其中Global.pretrained_model是我们训练好并且需要测试的模型，Global.infer_img为所要检测的图片路径。

python tools/infer_rec.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o Global.pretrained_model=output/rec_ppocr_v3/latest.pdparams Global.infer_img="./data/test_images/1.jpeg"

识别模型结束

***************************************************************************

四、转换成推理模型

1、在anaconda终端中输入指令进行测试

其中Global.pretrained_model是训练好并且需要推理的模型，Global.save_inference_dir为要保存推理模型的位置。推理模型是可以直接被调用进行识别和检测。分别把训练好的文字检测模型和文字识别模型推理。

python tools/export_model.py -c "./configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml" -o Global.pretrained_model="./output/ch_db_res18/latest.pdparams" Global.save_inference_dir="./inference_model/det/"

保存在 inference model is saved to ./inference_model/det/inference

python tools/export_model.py -c "./configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml" -o Global.pretrained_model="./output/rec_ppocr_v3/latest.pdparams" Global.save_inference_dir="./inference_model/rec/"

保存在 inference model is saved to ./inference_model/rec/inference

其中det和rec即是保存的推理模型

2.用predict_system.py进行验证

打开anaconda终端输入以下指令：

python tools/infer/predict_system.py  --det_model_dir="./inference_model/det/" --rec_model_dir="./inference_model/rec"  --image_dir="./data/test_images/3.jpeg"