旷世yolox自定义数据训练和验证和onnx导出推理

1.前言

2.代码

3.环境

4.自定义数据形态

5.配置文件

6.训练

7.验证

8.评估混淆矩阵

9.导出onnx

10.onnx推理

-- 补充：docker环境

1.前言

旷世科技的yolox比较清爽，效果也不错，简单总结主要有三点创新比较高：decoupled head、anchor-free以及advanced label assigning strategy(SimOTA)

2.代码

截止到我写文章，我是下载的main分支，路径YOLOX\yolox\__init__.py下看版本是0.3.0
GitHub - Megvii-BaseDetection/YOLOX: YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/ - Megvii-BaseDetection/YOLOXhttps://github.com/Megvii-BaseDetection/YOLOX

3.环境

（1）跟着requirements.txt来就好，基本上没啥问题

（2）搞一个docker容器，后边或综合文末补充。。。

装好了后，试一下效果看有没得，默认推理结果应该在YOLOX_outputs下边的以yolox_s命名文件夹下：

python tools/demo.py image -n yolox-s -c checkpoints/yolox_s.pth --path assets/dog.jpg --conf 0.25 --nms 0.45 --tsize 640 --save_result --device gpu

image是推理模式，如果是输入视频则为video
-n 是模型的名字
-c 为权重文件地址
–path是测试的图片路径
–conf 置信度阈值
–nms nms的iou阈值
–tsize 测试图片大小
–save_result 是否保存推理结果

4.自定义数据形态

参考我曾经写的一篇mmdetection中，数据集那一块内容：

记录一次 mmdetection 自定义数据训练和推理_记录一次mmdetection自定义数据训练和推理-CSDN博客文章浏览阅读1.8k次，点赞2次，收藏14次。总体参考如下（还有其他CSDN和知乎贴子）：1. 环境安装除了安装基础的python，pytorch等，重点是mmcv和mmcls！由于要用到开发场景，不要用pip安装封装好的包，用官方建议（官方install那一步也有讲）：pip install openmimmim install -e .2. 代码直接clone的mmlab官方源码：GitHub - open-mmlab/mmdetection: OpenMMLab Detection Toolbox and Be.._记录一次mmdetection自定义数据训练和推理https://blog.csdn.net/hzy459176895/article/details/123690217?spm=1001.2014.3001.5502核心在于：（1）labelme打标；（2）labelme2coco的标签样式（3）调整为yolox训练用样式

也可以其他处理方式，总之，为了迎合本次yolox训练用！最后自定义数据，my_data文件夹下长成这样样子【去下个coco128以它为例也行，人家README好像也说了】：

关心红框就行，annotations中长这样【左】， train2017和val2017中长这样【右】，就是图：

标签json中就是coco格式的目标检测标签，以instances_val2017.json为例，长下边这样：

{"info": {"year": 2021, "version": "1.0", "description": "For object detection", "date_created": "2021"}, "images": [{"date_captured": "2021", "file_name": "000000000001.jpg", "id": 1, "height": 480, "width": 640}, {"date_captured": "2021", "file_name": "000000000002.jpg", "id": 2, "height": 426, "width": 640}, {"date_captured": "2021", "file_name": "000000000003.jpg", "id": 3, "height": 428, "width": 640}, {"date_captured": "2021", "file_name": "000000000004.jpg", "id": 4, "height": 425, "width": 640}, {"date_captured": "2021", "file_name": "000000000005.jpg", "id": 5, "height": 640, "width": 481}], "licenses": [{"id": 1, "name": "GNU General Public License v3.0", "url": "https://github.com/zhiqwang/yolov5-rt-stack/blob/master/LICENSE"}], "type": "instances", "annotations": [{"segmentation": [[1.0799999999999272, 187.69008000000002, 612.66976, 187.69008000000002, 612.66976, 473.53008000000005, 1.0799999999999272, 473.53008000000005]], "area": 174816.81699840003, "iscrowd": 0, "image_id": 1, "bbox": [1.0799999999999272, 187.69008000000002, 611.5897600000001, 285.84000000000003], "category_id": 19, "id": 1}, {"segmentation": [[311.73024, 4.310159999999996, 631.0102400000001, 4.310159999999996, 631.0102400000001, 232.99032, 311.73024, 232.99032]], "area": 73013.00148480001, "iscrowd": 0, "image_id": 1, "bbox": [311.73024, 4.310159999999996, 319.28000000000003, 228.68016], "category_id": 50, "id": 2}, {"segmentation": [[249.60032, 229.27031999999997, 565.84032, 229.27031999999997, 565.84032, 474.35015999999996, 249.60032, 474.35015999999996]], "area": 77504.04860159999, "iscrowd": 0, "image_id": 1, "bbox": [249.60032, 229.27031999999997, 316.24, 245.07984], "category_id": 70, "id": 3}, {"segmentation": [[0.00031999999998788553, 13.510079999999988, 434.48032, 13.510079999999988, 434.48032, 388.63008, 0.00031999999998788553, 388.63008]], "area": 162982.13760000002, "iscrowd": 0, "image_id": 1, "bbox": [0.00031999999998788553, 13.510079999999988, 434.48, 375.12], "category_id": 38, "id": 4}, {"segmentation": [[376.2, 40.36008, 451.75007999999997, 40.36008, 451.75007999999997, 86.88983999999999, 376.2, 86.88983999999999]], "area": 3515.3270903807993, "iscrowd": 0, "image_id": 1, "bbox": [376.2, 40.36008, 75.55008, 46.529759999999996], "category_id": 33, "id": 5}, {"segmentation": [[465.77984, 38.97, 523.8496, 38.97, 523.8496, 85.63991999999999, 465.77984, 85.63991999999999]], "area": 2710.1110536191995, "iscrowd": 0, "image_id": 1, "bbox": [465.77984, 38.97, 58.069759999999995, 46.66992], "category_id": 8, "id": 6}, {"segmentation": [[385.70016, 73.65984, 469.71999999999997, 73.65984, 469.71999999999997, 144.16992, 385.70016, 144.16992]], "area": 5924.245639987201, "iscrowd": 0, "image_id": 1, "bbox": [385.70016, 73.65984, 84.01984, 70.51008], "category_id": 62, "id": 7}, {"segmentation": [[364.0496, 2.49024, 458.80992000000003, 2.49024, 458.80992000000003, 73.56, 364.0496, 73.56]], "area": 6734.593199923201, "iscrowd": 0, "image_id": 1, "bbox": [364.0496, 2.49024, 94.76032000000001, 71.06976], "category_id": 45, "id": 8}, {"segmentation": [[385.52992, 60.030002999999994, 600.50016, 60.030002999999994, 600.50016, 357.19013700000005, 385.52992, 357.19013700000005]], "area": 63880.58532441216, "iscrowd": 0, "image_id": 2, "bbox": [385.52992, 60.030002999999994, 214.97024, 297.160134], "category_id": 71, "id": 9}, {"segmentation": [[53.01024000000001, 356.49000599999994, 185.04032, 356.49000599999994, 185.04032, 411.6800099999999, 53.01024000000001, 411.6800099999999]], "area": 7286.7406433203205, "iscrowd": 0, "image_id": 2, "bbox": [53.01024000000001, 356.49000599999994, 132.03008, 55.190004], "category_id": 27, "id": 10}, {"segmentation": [[204.86016, 31.019728000000015, 459.74016, 31.019728000000015, 459.74016, 355.13984800000003, 204.86016, 355.13984800000003]], "area": 82611.73618559999, "iscrowd": 0, "image_id": 3, "bbox": [204.86016, 31.019728000000015, 254.88, 324.12012], "category_id": 27, "id": 11}, {"segmentation": [[237.56032, 155.809976, 403.96032, 155.809976, 403.96032, 351.060152, 237.56032, 351.060152]], "area": 32489.6292864, "iscrowd": 0, "image_id": 3, "bbox": [237.56032, 155.809976, 166.4, 195.25017599999998], "category_id": 58, "id": 12}, {"segmentation": [[0.960000000000008, 20.060000000000002, 442.19007999999997, 20.060000000000002, 442.19007999999997, 399.21015, 0.960000000000008, 399.21015]], "area": 167292.451016512, "iscrowd": 0, "image_id": 4, "bbox": [0.960000000000008, 20.060000000000002, 441.23008, 379.15015], "category_id": 19, "id": 13}, {"segmentation": [[0, 50.11967999999999, 457.680158, 50.11967999999999, 457.680158, 480.46975999999995, 0, 480.46975999999995]], "area": 196962.69260971263, "iscrowd": 0, "image_id": 5, "bbox": [0, 50.11967999999999, 457.680158, 430.35008], "category_id": 35, "id": 14}, {"segmentation": [[167.5801595, 162.88991999999993, 478.19023849999996, 162.88991999999993, 478.19023849999996, 628.0796799999999, 167.5801595, 628.0796799999999]], "area": 144492.62810359104, "iscrowd": 0, "image_id": 5, "bbox": [167.5801595, 162.88991999999993, 310.610079, 465.18976000000004], "category_id": 57, "id": 15}], "categories": [{"id": 1, "name": "0", "supercategory": "0"}, {"id": 2, "name": "1", "supercategory": "1"}, {"id": 3, "name": "2", "supercategory": "2"}, {"id": 4, "name": "3", "supercategory": "3"}, {"id": 5, "name": "4", "supercategory": "4"}, {"id": 6, "name": "5", "supercategory": "5"}, {"id": 7, "name": "6", "supercategory": "6"}, {"id": 8, "name": "7", "supercategory": "7"}, {"id": 9, "name": "8", "supercategory": "8"}, {"id": 10, "name": "9", "supercategory": "9"}, {"id": 11, "name": "10", "supercategory": "10"}, {"id": 12, "name": "11", "supercategory": "11"}, {"id": 13, "name": "12", "supercategory": "12"}, {"id": 14, "name": "13", "supercategory": "13"}, {"id": 15, "name": "14", "supercategory": "14"}, {"id": 16, "name": "15", "supercategory": "15"}, {"id": 17, "name": "16", "supercategory": "16"}, {"id": 18, "name": "17", "supercategory": "17"}, {"id": 19, "name": "18", "supercategory": "18"}, {"id": 20, "name": "19", "supercategory": "19"}, {"id": 21, "name": "20", "supercategory": "20"}, {"id": 22, "name": "21", "supercategory": "21"}, {"id": 23, "name": "22", "supercategory": "22"}, {"id": 24, "name": "23", "supercategory": "23"}, {"id": 25, "name": "24", "supercategory": "24"}, {"id": 26, "name": "25", "supercategory": "25"}, {"id": 27, "name": "26", "supercategory": "26"}, {"id": 28, "name": "27", "supercategory": "27"}, {"id": 29, "name": "28", "supercategory": "28"}, {"id": 30, "name": "29", "supercategory": "29"}, {"id": 31, "name": "30", "supercategory": "30"}, {"id": 32, "name": "31", "supercategory": "31"}, {"id": 33, "name": "32", "supercategory": "32"}, {"id": 34, "name": "33", "supercategory": "33"}, {"id": 35, "name": "34", "supercategory": "34"}, {"id": 36, "name": "35", "supercategory": "35"}, {"id": 37, "name": "36", "supercategory": "36"}, {"id": 38, "name": "37", "supercategory": "37"}, {"id": 39, "name": "38", "supercategory": "38"}, {"id": 40, "name": "39", "supercategory": "39"}, {"id": 41, "name": "40", "supercategory": "40"}, {"id": 42, "name": "41", "supercategory": "41"}, {"id": 43, "name": "42", "supercategory": "42"}, {"id": 44, "name": "43", "supercategory": "43"}, {"id": 45, "name": "44", "supercategory": "44"}, {"id": 46, "name": "45", "supercategory": "45"}, {"id": 47, "name": "46", "supercategory": "46"}, {"id": 48, "name": "47", "supercategory": "47"}, {"id": 49, "name": "48", "supercategory": "48"}, {"id": 50, "name": "49", "supercategory": "49"}, {"id": 51, "name": "50", "supercategory": "50"}, {"id": 52, "name": "51", "supercategory": "51"}, {"id": 53, "name": "52", "supercategory": "52"}, {"id": 54, "name": "53", "supercategory": "53"}, {"id": 55, "name": "54", "supercategory": "54"}, {"id": 56, "name": "55", "supercategory": "55"}, {"id": 57, "name": "56", "supercategory": "56"}, {"id": 58, "name": "57", "supercategory": "57"}, {"id": 59, "name": "58", "supercategory": "58"}, {"id": 60, "name": "59", "supercategory": "59"}, {"id": 61, "name": "60", "supercategory": "60"}, {"id": 62, "name": "61", "supercategory": "61"}, {"id": 63, "name": "62", "supercategory": "62"}, {"id": 64, "name": "63", "supercategory": "63"}, {"id": 65, "name": "64", "supercategory": "64"}, {"id": 66, "name": "65", "supercategory": "65"}, {"id": 67, "name": "66", "supercategory": "66"}, {"id": 68, "name": "67", "supercategory": "67"}, {"id": 69, "name": "68", "supercategory": "68"}, {"id": 70, "name": "69", "supercategory": "69"}, {"id": 71, "name": "70", "supercategory": "70"}]}

5.配置文件

把exps\default\yolox_s.py复制一份到exps\yolox_s_me.py，改动如下：

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# Copyright (c) Megvii, Inc. and its affiliates.

import os

from yolox.exp import Exp as MyExp

"""
自定义数据直接快速训练注意事项：
1.dataset下的annotations下，写成同coco128一致的标签名，方便直接用。 instances_train2017.json 与 instances_val2017.json
2.训练集和验证集文件夹名称也同样写成train2017和val2017[CocoDataset里直接用的这些]
3.yolox\data\datasets\coco_classes.py 下边的 COCO_CLASSES 写成本次业务数据的标签类
"""

from yolox.data.datasets.coco_classes import COCO_CLASSES  # 运行前快速抵达修改类别
DATA_DIR = "datasets/my_data"  # 本次训练业务数据

class Exp(MyExp):
    def __init__(self):
        super(Exp, self).__init__()
        self.depth = 0.33
        self.width = 0.50
        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]

        # Define yourself dataset path
        self.data_dir = DATA_DIR
        self.train_ann = "instances_train2017.json"
        self.val_ann = "instances_val2017.json"

        self.num_classes = len(COCO_CLASSES)

        self.max_epoch = 50
        self.data_num_workers = 4
        self.eval_interval = 10  # 多少次迭代评估
        self.print_interval = 10  # 多少批次打印
        self.save_history_ckpt = False  # 只保存最后一次， 如果设置True，则保存每一次

上边代码有两个点，

第一个点：是数据集一些文件名就懒得改了，因为EXP的类中直接用了coco的标签文件名；

第二个点是yolox.data.datasets.coco_classes.py中去加一个COCO_CLASSES作为你的自定义类，方便配置文件中直接加载用！！！如：COCO_CLASSES = ('c1', 'c2')

6.训练

把tools\train.py复制一份出来到根目录，比如my_train.py，里边代码改动如下【上边不动其他的，主函数加上你的配置文件和超参数啥的即可】：

if __name__ == "__main__":

    # export CUDA_VISIBLE_DEVICES=1  # 终端执行回车【1卡训练】，注意要大写

    configure_module()
    args = make_parser().parse_args()


    # 训练自己数据  主要是改动如下这四行啥的，训练结果默认在output的py文件夹名下
    args.exp_file = "exps/yolox_s_me.py"  # 叠加训练100次
    args.device = 1  # 用1张卡
    args.batch_size = 16 
    args.ckpt = "checkpoints/yolox_s.pth"  # 预训练模型，跟着readme前去下载，还有yolo_l等大模型对应yolox_l的其他大的配置文件

    exp = get_exp(args.exp_file, args.name)
    exp.merge(args.opts)
    check_exp_value(exp)

    if not args.experiment_name:
        args.experiment_name = exp.exp_name

    num_gpu = get_num_devices() if args.devices is None else args.devices
    assert num_gpu <= get_num_devices()

    if args.cache is not None:
        exp.dataset = exp.get_dataset(cache=True, cache_type=args.cache)

    dist_url = "auto" if args.dist_url is None else args.dist_url
    launch(
        main,
        num_gpu,
        args.num_machines,
        args.machine_rank,
        backend=args.dist_backend,
        dist_url=dist_url,
        args=(exp, args),
    )

训练完了后，在YOLOX_outputs\yolox_s_me下边有结果，日志啥的，best.pth模型啥的。。。

7.验证

把tools/eval.py复制一份到根目录，叫my_eval.py，简单验证就不动其他内容，就主函数里加配置文件和你的best_ckpt.pth啥的，验的是验证集效果，

if __name__ == "__main__":

    """
    用gpu方法：
    1.终端运行 export CUDA_VISIBLE_DEVICES=1  表示用gpu为1的卡
    2.代码中  args.device = 1 表示用1张卡
    """

    # 自定义数据识别
    from yolox.data.datasets.coco_classes import COCO_CLASSES  # 运行前快速抵达修改类别
    args.exp_file = "exps/yolox_s_me.py"
    args.ckpt = "YOLOX_outputs/yolox_s_me/best_ckpt.pth"

    args.batch_size = 2
    args.device = 1
    args.conf = 0.5
    args.nms = 0.5
    args.test = False  # 不含测试集的 只验证val集  balabala。。。。。。

    exp = get_exp(args.exp_file, args.name)
    exp.merge(args.opts)

    if not args.experiment_name:
        args.experiment_name = exp.exp_name

    num_gpu = torch.cuda.device_count() if args.devices is None else args.devices
    assert num_gpu <= torch.cuda.device_count()

    dist_url = "auto" if args.dist_url is None else args.dist_url
    launch(
        main,
        num_gpu,
        args.num_machines,
        args.machine_rank,
        backend=args.dist_backend,
        dist_url=dist_url,
        args=(exp, args, num_gpu),
    )

8.评估混淆矩阵

目标检测都是map，不好给领导汇报，领导只想看目标对与错了多少，可以转为类似分类的混淆矩阵和精准率，召回率啥的。【注：不做改动的demo.py就是推理图片用的，args.path 为图片路径则推理一张图，若为一个文件夹则推理文件夹中所有图！！！结果依然在YOLOX_output】

下面是我把tools/demo.py复制了一份叫demo_metric.py，加一些操作评估精准率和召回率：

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# Copyright (c) Megvii, Inc. and its affiliates.

import argparse
import json
import os
import time

import numpy as np
from loguru import logger

import cv2

import torch

from yolox.data.data_augment import ValTransform
from yolox.exp import get_exp
from yolox.utils import fuse_model, get_model_info, postprocess, vis

IMAGE_EXT = [".jpg", ".jpeg", ".webp", ".bmp", ".png"]



# Calculate precision and recall for object detection
def calculate_precision_recall(detected_boxes, true_boxes, iou_threshold):
    """
    Calculate precision and recall for object detection
    :param detected_boxes: list of detected bounding boxes in format [xmin, ymin, xmax, ymax]
    :param true_boxes: list of true bounding boxes in format [xmin, ymin, xmax, ymax]
    :param iou_threshold: intersection over union threshold for matching detected and true boxes
    :return: precision and recall
    """
    num_true_boxes = len(true_boxes)
    true_positives = 0
    for true_box in true_boxes:
        max_iou = 0
        for detected_box in detected_boxes:
            iou = calculate_iou(detected_box, true_box)
            if iou > max_iou:
                max_iou = iou
            if max_iou >= iou_threshold:
                true_positives += 1
                break
    false_positives = len(detected_boxes) - true_positives
    false_negatives = num_true_boxes - true_positives
    precision = true_positives / (true_positives + false_positives)
    recall = true_positives / (true_positives + false_negatives)
    # print('TP: ', true_positives)
    # print('FP: ', false_positives)
    # print('FN: ', false_negatives)
    return precision, recall


def calculate_iou(box1, box2):
    """
    Calculate intersection over union (IoU) between two bounding boxes
    :param box1: bounding box in format [xmin, ymin, xmax, ymax]
    :param box2: bounding box in format [xmin, ymin, xmax, ymax]
    :return: IoU between box1 and box2
    """
    x1 = max(box1[0], box2[0])
    y1 = max(box1[1], box2[1])
    x2 = min(box1[2], box2[2])
    y2 = min(box1[3], box2[3])
    intersection = max(0, x2 - x1) * max(0, y2 - y1)
    area_box1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
    area_box2 = (box2[2] - box2[0]) * (box2[3] - box2[1])
    union = area_box1 + area_box2 - intersection
    iou = intersection / union
    return iou





def make_parser():
    parser = argparse.ArgumentParser("YOLOX Demo!")
    # parser.add_argument(
    #     "demo", default="image", help="demo type, eg. image, video and webcam"
    # )
    parser.add_argument(
        "--demo", default="image", help="demo type, eg. image, video and webcam"
    )
    parser.add_argument("-expn", "--experiment-name", type=str, default=None)
    parser.add_argument("-n", "--name", type=str, default=None, help="model name")

    parser.add_argument(
        "--path", default="./assets/dog.jpg", help="path to images or video"
    )
    parser.add_argument("--camid", type=int, default=0, help="webcam demo camera id")
    parser.add_argument(
        "--save_result",
        action="store_true",
        help="whether to save the inference result of image/video",
    )

    # exp file
    parser.add_argument(
        "-f",
        "--exp_file",
        default=None,
        type=str,
        help="please input your experiment description file",
    )
    parser.add_argument("-c", "--ckpt", default=None, type=str, help="ckpt for eval")
    parser.add_argument(
        "--device",
        default="cpu",
        type=str,
        help="device to run our model, can either be cpu or gpu",
    )
    parser.add_argument("--conf", default=0.3, type=float, help="test conf")
    parser.add_argument("--nms", default=0.3, type=float, help="test nms threshold")
    parser.add_argument("--tsize", default=None, type=int, help="test img size")
    parser.add_argument(
        "--fp16",
        dest="fp16",
        default=False,
        action="store_true",
        help="Adopting mix precision evaluating.",
    )
    parser.add_argument(
        "--legacy",
        dest="legacy",
        default=False,
        action="store_true",
        help="To be compatible with older versions",
    )
    parser.add_argument(
        "--fuse",
        dest="fuse",
        default=False,
        action="store_true",
        help="Fuse conv and bn for testing.",
    )
    parser.add_argument(
        "--trt",
        dest="trt",
        default=False,
        action="store_true",
        help="Using TensorRT model for testing.",
    )
    return parser


def get_image_list(path):
    image_names = []
    for maindir, subdir, file_name_list in os.walk(path):
        for filename in file_name_list:
            apath = os.path.join(maindir, filename)
            ext = os.path.splitext(apath)[1]
            if ext in IMAGE_EXT:
                image_names.append(apath)
    return image_names


class Predictor(object):
    def __init__(
        self,
        model,
        exp,
        cls_names,
        trt_file=None,
        decoder=None,
        device="cpu",
        fp16=False,
        legacy=False,
    ):
        self.model = model
        self.cls_names = cls_names
        self.decoder = decoder
        self.num_classes = exp.num_classes
        self.confthre = exp.test_conf
        self.nmsthre = exp.nmsthre
        self.test_size = exp.test_size
        self.device = device
        self.fp16 = fp16
        self.preproc = ValTransform(legacy=legacy)
        if trt_file is not None:
            from torch2trt import TRTModule

            model_trt = TRTModule()
            model_trt.load_state_dict(torch.load(trt_file))

            x = torch.ones(1, 3, exp.test_size[0], exp.test_size[1]).cuda()
            self.model(x)
            self.model = model_trt

    def inference(self, img):
        img_info = {"id": 0}
        if isinstance(img, str):
            img_info["file_name"] = os.path.basename(img)
            img = cv2.imread(img)
        else:
            img_info["file_name"] = None

        height, width = img.shape[:2]
        img_info["height"] = height
        img_info["width"] = width
        img_info["raw_img"] = img

        ratio = min(self.test_size[0] / img.shape[0], self.test_size[1] / img.shape[1])
        img_info["ratio"] = ratio

        img, _ = self.preproc(img, None, self.test_size)
        img = torch.from_numpy(img).unsqueeze(0)
        img = img.float()
        if self.device == "gpu":
            img = img.cuda()
            if self.fp16:
                img = img.half()  # to FP16

        with torch.no_grad():
            t0 = time.time()
            outputs = self.model(img)
            if self.decoder is not None:
                outputs = self.decoder(outputs, dtype=outputs.type())
            outputs = postprocess(
                outputs, self.num_classes, self.confthre,
                self.nmsthre, class_agnostic=True
            )  # (x1, y1, x2, y2, obj_conf, class_conf, class_pred)  这个框有对象的概率，对象是车的概率，车的标签代号
            logger.info("Infer time: {:.4f}s".format(time.time() - t0))
        return outputs, img_info

    def visual(self, output, img_info, cls_conf=0.35):
        ratio = img_info["ratio"]
        img = img_info["raw_img"]
        if output is None:
            return img
        output = output.cpu()

        bboxes = output[:, 0:4]

        # preprocessing: resize
        bboxes /= ratio

        cls = output[:, 6]
        scores = output[:, 4] * output[:, 5]

        vis_res = vis(img, bboxes, scores, cls, cls_conf, self.cls_names)
        return vis_res


def image_demo(predictor, path, args):
    if os.path.isdir(path):
        files = get_image_list(path)
    else:
        files = [path]
    files.sort()

    current_time = time.localtime()
    # 一个图
    outputs, img_info = predictor.inference(files[0])

    return outputs, img_info


def imageflow_demo(predictor, vis_folder, current_time, args):
    cap = cv2.VideoCapture(args.path if args.demo == "video" else args.camid)
    width = cap.get(cv2.CAP_PROP_FRAME_WIDTH)  # float
    height = cap.get(cv2.CAP_PROP_FRAME_HEIGHT)  # float
    fps = cap.get(cv2.CAP_PROP_FPS)
    if args.save_result:
        save_folder = os.path.join(
            vis_folder, time.strftime("%Y_%m_%d_%H_%M_%S", current_time)
        )
        os.makedirs(save_folder, exist_ok=True)
        if args.demo == "video":
            save_path = os.path.join(save_folder, os.path.basename(args.path))
        else:
            save_path = os.path.join(save_folder, "camera.mp4")
        logger.info(f"video save_path is {save_path}")
        vid_writer = cv2.VideoWriter(
            save_path, cv2.VideoWriter_fourcc(*"mp4v"), fps, (int(width), int(height))
        )
    while True:
        ret_val, frame = cap.read()
        if ret_val:
            outputs, img_info = predictor.inference(frame)
            result_frame = predictor.visual(outputs[0], img_info, predictor.confthre)
            if args.save_result:
                vid_writer.write(result_frame)
            else:
                cv2.namedWindow("yolox", cv2.WINDOW_NORMAL)
                cv2.imshow("yolox", result_frame)
            ch = cv2.waitKey(1)
            if ch == 27 or ch == ord("q") or ch == ord("Q"):
                break
        else:
            break


def main(exp, args, COCO_CLASSES_):
    if not args.experiment_name:
        args.experiment_name = exp.exp_name

    file_name = os.path.join(exp.output_dir, args.experiment_name)
    os.makedirs(file_name, exist_ok=True)

    vis_folder = None
    if args.save_result:
        vis_folder = os.path.join(file_name, "vis_res")
        os.makedirs(vis_folder, exist_ok=True)

    if args.trt:
        args.device = "gpu"

    logger.info("Args: {}".format(args))

    if args.conf is not None:
        exp.test_conf = args.conf
    if args.nms is not None:
        exp.nmsthre = args.nms
    if args.tsize is not None:
        exp.test_size = (args.tsize, args.tsize)

    model = exp.get_model()
    logger.info("Model Summary: {}".format(get_model_info(model, exp.test_size)))

    if args.device == "gpu":
        model.cuda()
        if args.fp16:
            model.half()  # to FP16
    model.eval()

    if not args.trt:
        if args.ckpt is None:
            ckpt_file = os.path.join(file_name, "best_ckpt.pth")
        else:
            ckpt_file = args.ckpt
        logger.info("loading checkpoint")
        ckpt = torch.load(ckpt_file, map_location="cpu")
        # load the model state dict
        model.load_state_dict(ckpt["model"])
        logger.info("loaded checkpoint done.")

    if args.fuse:
        logger.info("\tFusing model...")
        model = fuse_model(model)

    if args.trt:
        assert not args.fuse, "TensorRT model is not support model fusing!"
        trt_file = os.path.join(file_name, "model_trt.pth")
        assert os.path.exists(
            trt_file
        ), "TensorRT model is not found!\n Run python3 tools/trt.py first!"
        model.head.decode_in_inference = False
        decoder = model.head.decode_outputs
        logger.info("Using TensorRT to inference")
    else:
        trt_file = None
        decoder = None


    predictor = Predictor(
        model=model, exp=exp, cls_names=COCO_CLASSES_, trt_file=trt_file, decoder=decoder,
        device=args.device, fp16=args.fp16, legacy=args.legacy,
    )

    # images 一个图
    outputs, img_info = image_demo(predictor, args.path, args)


    return outputs, img_info


if __name__ == "__main__":

    args = make_parser().parse_args()

    # 自定义数据识别
    from yolox.data.datasets.coco_classes import COCO_CLASSES  # 运行前快速抵达修改类别
    args.exp_file = "exps/yolox_s_me.py"
    args.ckpt = "YOLOX_outputs/yolox_s_me/ckpt_200.pth"
    data_path = "datasets/xxx/val2017"
    val_jsons = "datasets/xxx/annotations/instances_val2017.json"
    COCO_CLASSES_ = COCO_CLASSES
    args.conf = 0.5
    args.nms = 0.5
    args.device = "cpu"  # 转numpy要用
    args.save_result = False


    # 获取标签的所有框框
    with open(val_jsons, "r") as f:
        val_labels = json.load(f)

    res_id_img = {}
    imgs = []
    for image_info in val_labels['images']:
        res_id_img[image_info['id']] = image_info['file_name']
        imgs.append(image_info['file_name'])

    bbox_res = {}
    for ann in val_labels['annotations']:
        bbox_new = [ann['bbox'][0], ann['bbox'][1], ann['bbox'][0]+ann['bbox'][2], ann['bbox'][1]+ann['bbox'][3]]
        if res_id_img[ann['image_id']] not in bbox_res.keys():
            bbox_res[res_id_img[ann['image_id']]] = [bbox_new]
        else:
            tmp_list = bbox_res[res_id_img[ann['image_id']]]
            tmp_list.append(bbox_new)
            bbox_res[res_id_img[ann['image_id']]] = tmp_list


    res_all = {}
    imgs = list(set(imgs))

    # imgs = ["1.jpg"]  # 调试单张

    for img in imgs:
        img_file = os.path.join(data_path, img)
        args.path = img_file
        # 推理
        exp = get_exp(args.exp_file, args.name)
        res, img_info = main(exp, args, COCO_CLASSES_)
        res_list = np.array(res[0]).tolist()
        infer_bbox = [[ii[0]/img_info["ratio"], ii[1]/img_info["ratio"], ii[2]/img_info["ratio"], ii[3]/img_info["ratio"]] for ii in res_list]  # img_info中有一个转换率

        if img not in res_all.keys():
            res_all[img] = [infer_bbox]
        else:
            tmp = res_all[img]
            tmp.append(infer_bbox)
            res_all[img] = tmp

    precision_all = []
    recall_all = []
    print()
    iou_threshold = 0.3  # 计算精准召回时候iou设定
    for img in imgs:
        precision, recall = calculate_precision_recall(detected_boxes=res_all[img][0], true_boxes=bbox_res[img], iou_threshold=iou_threshold)
        print(img, precision, recall)
        precision_all.append(precision)
        recall_all.append(recall)
    print()
    print('验证集平均精度 ', float(sum(precision_all)/len(precision_all)))
    print('验证集平均召回 ', float(sum(recall_all)/len(recall_all)))

9.导出onnx

同理，把tools/export_onnx.py复制一份出来，加如下内容即可实现导出onnx，所有结果都在前面说的那个路径下：

    args = make_parser().parse_args()
    # 主函数下args变量下加下面一些内容：
    os.mkdir("YOLOX_onnx")
    args.output_name = "YOLOX_onnx/yolox_s_me.onnx"  
    args.exp_file = "exps/yolox_s_me.py"
    args.ckpt = "YOLOX_outputs/yolox_s_me/best_ckpt.pth"

10.onnx推理

主要是数据输入要搞正确，yolox默认640，以及其他一些onnxruntime的格式要正确，onnx推理图片的目标检测如下：

import os
import cv2
import numpy as np
import onnxruntime
from yolox.data.data_augment import preproc as preprocess
from yolox.utils.demo_utils import multiclass_nms, demo_postprocess
from yolox.utils.visualize import vis

"""
yolox_onnx 推理 demo
"""

if __name__ == '__main__':


    COCO_CLASSES = ('c1', 'c2', ...) 
    data_path = "test_datas"  # 测试图片的文件夹，里边是一堆图片
    model = "/xxx/yolox_s_me.onnx"  # 前面导出的onnx模型
    output_dir = "/xxx/xxx/result"  # 推理结果保存文件夹
    score_thr = 0.5
    NMS = 0.5

    for img in os.listdir(data_path):
        image_path = os.path.join(data_path, img)
        input_shape = "640, 640"
        input_shape = tuple(map(int, input_shape.split(',')))
        origin_img = cv2.imread(image_path)
        img, ratio = preprocess(origin_img, input_shape)

        session = onnxruntime.InferenceSession(model)

        ort_inputs = {session.get_inputs()[0].name: img[None, :, :, :]}  #  ort_input 1 3 640 640
        output = session.run(None, ort_inputs)  # [1 8400 12]
        predictions = demo_postprocess(output[0], input_shape)[0]

        boxes = predictions[:, :4]
        scores = predictions[:, 4:5] * predictions[:, 5:]

        boxes_xyxy = np.ones_like(boxes)
        boxes_xyxy[:, 0] = boxes[:, 0] - boxes[:, 2]/2.
        boxes_xyxy[:, 1] = boxes[:, 1] - boxes[:, 3]/2.
        boxes_xyxy[:, 2] = boxes[:, 0] + boxes[:, 2]/2.
        boxes_xyxy[:, 3] = boxes[:, 1] + boxes[:, 3]/2.
        boxes_xyxy /= ratio
        dets = multiclass_nms(boxes_xyxy, scores, nms_thr=NMS, score_thr=score_thr)

        if dets is not None:
            final_boxes, final_scores, final_cls_inds = dets[:, :4], dets[:, 4], dets[:, 5]
            origin_img = vis(origin_img, final_boxes, final_scores, final_cls_inds,
                             conf=score_thr, class_names=COCO_CLASSES)

        if not os.path.exists(output_dir):
            os.makedirs(output_dir)

        output_path = os.path.join(output_dir, os.path.basename(image_path))
        cv2.imwrite(output_path, origin_img)
        print('one img infer ok.')
    print('all img infer ok !!!')