目录
1.前言
2.代码
3.环境
4.自定义数据形态
5.配置文件
6.训练
7.验证
8.评估混淆矩阵
9.导出onnx
10.onnx推理
-- 补充:docker环境
1.前言
旷世科技的yolox比较清爽,效果也不错,简单总结主要有三点创新比较高:decoupled head
、anchor-free
以及advanced label assigning strategy(SimOTA)
2.代码
截止到我写文章,我是下载的main分支,路径YOLOX\yolox\__init__.py下看版本是0.3.0
GitHub - Megvii-BaseDetection/YOLOX: YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/ - Megvii-BaseDetection/YOLOXhttps://github.com/Megvii-BaseDetection/YOLOX
3.环境
(1)跟着requirements.txt来就好,基本上没啥问题
(2)搞一个docker容器,后边或综合文末补充。。。
装好了后,试一下效果看有没得,默认推理结果应该在YOLOX_outputs下边的以yolox_s命名文件夹下:
python tools/demo.py image -n yolox-s -c checkpoints/yolox_s.pth --path assets/dog.jpg --conf 0.25 --nms 0.45 --tsize 640 --save_result --device gpu
- image是推理模式,如果是输入视频则为video
- -n 是模型的名字
- -c 为权重文件地址
- –path是测试的图片路径
- –conf 置信度阈值
- –nms nms的iou阈值
- –tsize 测试图片大小
- –save_result 是否保存推理结果
4.自定义数据形态
参考我曾经写的一篇mmdetection中,数据集那一块内容:
记录一次 mmdetection 自定义数据训练和推理_记录一次mmdetection自定义数据训练和推理-CSDN博客文章浏览阅读1.8k次,点赞2次,收藏14次。总体参考如下(还有其他CSDN和知乎贴子):1. 环境安装除了安装基础的python,pytorch等,重点是mmcv和mmcls!由于要用到开发场景,不要用pip安装封装好的包,用官方建议(官方install那一步也有讲):pip install openmimmim install -e .2. 代码直接clone的mmlab官方源码:GitHub - open-mmlab/mmdetection: OpenMMLab Detection Toolbox and Be.._记录一次mmdetection自定义数据训练和推理https://blog.csdn.net/hzy459176895/article/details/123690217?spm=1001.2014.3001.5502核心在于: (1)labelme打标;(2)labelme2coco的标签样式 (3)调整为yolox训练用样式
也可以其他处理方式,总之,为了迎合本次yolox训练用!最后自定义数据,my_data文件夹下长成这样样子【去下个coco128以它为例也行,人家README好像也说了】:
关心红框就行,annotations中长这样【左】, train2017和val2017中长这样【右】,就是图:
标签json中就是coco格式的目标检测标签,以instances_val2017.json为例,长下边这样:
{"info": {"year": 2021, "version": "1.0", "description": "For object detection", "date_created": "2021"}, "images": [{"date_captured": "2021", "file_name": "000000000001.jpg", "id": 1, "height": 480, "width": 640}, {"date_captured": "2021", "file_name": "000000000002.jpg", "id": 2, "height": 426, "width": 640}, {"date_captured": "2021", "file_name": "000000000003.jpg", "id": 3, "height": 428, "width": 640}, {"date_captured": "2021", "file_name": "000000000004.jpg", "id": 4, "height": 425, "width": 640}, {"date_captured": "2021", "file_name": "000000000005.jpg", "id": 5, "height": 640, "width": 481}], "licenses": [{"id": 1, "name": "GNU General Public License v3.0", "url": "https://github.com/zhiqwang/yolov5-rt-stack/blob/master/LICENSE"}], "type": "instances", "annotations": [{"segmentation": [[1.0799999999999272, 187.69008000000002, 612.66976, 187.69008000000002, 612.66976, 473.53008000000005, 1.0799999999999272, 473.53008000000005]], "area": 174816.81699840003, "iscrowd": 0, "image_id": 1, "bbox": [1.0799999999999272, 187.69008000000002, 611.5897600000001, 285.84000000000003], "category_id": 19, "id": 1}, {"segmentation": [[311.73024, 4.310159999999996, 631.0102400000001, 4.310159999999996, 631.0102400000001, 232.99032, 311.73024, 232.99032]], "area": 73013.00148480001, "iscrowd": 0, "image_id": 1, "bbox": [311.73024, 4.310159999999996, 319.28000000000003, 228.68016], "category_id": 50, "id": 2}, {"segmentation": [[249.60032, 229.27031999999997, 565.84032, 229.27031999999997, 565.84032, 474.35015999999996, 249.60032, 474.35015999999996]], "area": 77504.04860159999, "iscrowd": 0, "image_id": 1, "bbox": [249.60032, 229.27031999999997, 316.24, 245.07984], "category_id": 70, "id": 3}, {"segmentation": [[0.00031999999998788553, 13.510079999999988, 434.48032, 13.510079999999988, 434.48032, 388.63008, 0.00031999999998788553, 388.63008]], "area": 162982.13760000002, "iscrowd": 0, "image_id": 1, "bbox": [0.00031999999998788553, 13.510079999999988, 434.48, 375.12], "category_id": 38, "id": 4}, {"segmentation": [[376.2, 40.36008, 451.75007999999997, 40.36008, 451.75007999999997, 86.88983999999999, 376.2, 86.88983999999999]], "area": 3515.3270903807993, "iscrowd": 0, "image_id": 1, "bbox": [376.2, 40.36008, 75.55008, 46.529759999999996], "category_id": 33, "id": 5}, {"segmentation": [[465.77984, 38.97, 523.8496, 38.97, 523.8496, 85.63991999999999, 465.77984, 85.63991999999999]], "area": 2710.1110536191995, "iscrowd": 0, "image_id": 1, "bbox": [465.77984, 38.97, 58.069759999999995, 46.66992], "category_id": 8, "id": 6}, {"segmentation": [[385.70016, 73.65984, 469.71999999999997, 73.65984, 469.71999999999997, 144.16992, 385.70016, 144.16992]], "area": 5924.245639987201, "iscrowd": 0, "image_id": 1, "bbox": [385.70016, 73.65984, 84.01984, 70.51008], "category_id": 62, "id": 7}, {"segmentation": [[364.0496, 2.49024, 458.80992000000003, 2.49024, 458.80992000000003, 73.56, 364.0496, 73.56]], "area": 6734.593199923201, "iscrowd": 0, "image_id": 1, "bbox": [364.0496, 2.49024, 94.76032000000001, 71.06976], "category_id": 45, "id": 8}, {"segmentation": [[385.52992, 60.030002999999994, 600.50016, 60.030002999999994, 600.50016, 357.19013700000005, 385.52992, 357.19013700000005]], "area": 63880.58532441216, "iscrowd": 0, "image_id": 2, "bbox": [385.52992, 60.030002999999994, 214.97024, 297.160134], "category_id": 71, "id": 9}, {"segmentation": [[53.01024000000001, 356.49000599999994, 185.04032, 356.49000599999994, 185.04032, 411.6800099999999, 53.01024000000001, 411.6800099999999]], "area": 7286.7406433203205, "iscrowd": 0, "image_id": 2, "bbox": [53.01024000000001, 356.49000599999994, 132.03008, 55.190004], "category_id": 27, "id": 10}, {"segmentation": [[204.86016, 31.019728000000015, 459.74016, 31.019728000000015, 459.74016, 355.13984800000003, 204.86016, 355.13984800000003]], "area": 82611.73618559999, "iscrowd": 0, "image_id": 3, "bbox": [204.86016, 31.019728000000015, 254.88, 324.12012], "category_id": 27, "id": 11}, {"segmentation": [[237.56032, 155.809976, 403.96032, 155.809976, 403.96032, 351.060152, 237.56032, 351.060152]], "area": 32489.6292864, "iscrowd": 0, "image_id": 3, "bbox": [237.56032, 155.809976, 166.4, 195.25017599999998], "category_id": 58, "id": 12}, {"segmentation": [[0.960000000000008, 20.060000000000002, 442.19007999999997, 20.060000000000002, 442.19007999999997, 399.21015, 0.960000000000008, 399.21015]], "area": 167292.451016512, "iscrowd": 0, "image_id": 4, "bbox": [0.960000000000008, 20.060000000000002, 441.23008, 379.15015], "category_id": 19, "id": 13}, {"segmentation": [[0, 50.11967999999999, 457.680158, 50.11967999999999, 457.680158, 480.46975999999995, 0, 480.46975999999995]], "area": 196962.69260971263, "iscrowd": 0, "image_id": 5, "bbox": [0, 50.11967999999999, 457.680158, 430.35008], "category_id": 35, "id": 14}, {"segmentation": [[167.5801595, 162.88991999999993, 478.19023849999996, 162.88991999999993, 478.19023849999996, 628.0796799999999, 167.5801595, 628.0796799999999]], "area": 144492.62810359104, "iscrowd": 0, "image_id": 5, "bbox": [167.5801595, 162.88991999999993, 310.610079, 465.18976000000004], "category_id": 57, "id": 15}], "categories": [{"id": 1, "name": "0", "supercategory": "0"}, {"id": 2, "name": "1", "supercategory": "1"}, {"id": 3, "name": "2", "supercategory": "2"}, {"id": 4, "name": "3", "supercategory": "3"}, {"id": 5, "name": "4", "supercategory": "4"}, {"id": 6, "name": "5", "supercategory": "5"}, {"id": 7, "name": "6", "supercategory": "6"}, {"id": 8, "name": "7", "supercategory": "7"}, {"id": 9, "name": "8", "supercategory": "8"}, {"id": 10, "name": "9", "supercategory": "9"}, {"id": 11, "name": "10", "supercategory": "10"}, {"id": 12, "name": "11", "supercategory": "11"}, {"id": 13, "name": "12", "supercategory": "12"}, {"id": 14, "name": "13", "supercategory": "13"}, {"id": 15, "name": "14", "supercategory": "14"}, {"id": 16, "name": "15", "supercategory": "15"}, {"id": 17, "name": "16", "supercategory": "16"}, {"id": 18, "name": "17", "supercategory": "17"}, {"id": 19, "name": "18", "supercategory": "18"}, {"id": 20, "name": "19", "supercategory": "19"}, {"id": 21, "name": "20", "supercategory": "20"}, {"id": 22, "name": "21", "supercategory": "21"}, {"id": 23, "name": "22", "supercategory": "22"}, {"id": 24, "name": "23", "supercategory": "23"}, {"id": 25, "name": "24", "supercategory": "24"}, {"id": 26, "name": "25", "supercategory": "25"}, {"id": 27, "name": "26", "supercategory": "26"}, {"id": 28, "name": "27", "supercategory": "27"}, {"id": 29, "name": "28", "supercategory": "28"}, {"id": 30, "name": "29", "supercategory": "29"}, {"id": 31, "name": "30", "supercategory": "30"}, {"id": 32, "name": "31", "supercategory": "31"}, {"id": 33, "name": "32", "supercategory": "32"}, {"id": 34, "name": "33", "supercategory": "33"}, {"id": 35, "name": "34", "supercategory": "34"}, {"id": 36, "name": "35", "supercategory": "35"}, {"id": 37, "name": "36", "supercategory": "36"}, {"id": 38, "name": "37", "supercategory": "37"}, {"id": 39, "name": "38", "supercategory": "38"}, {"id": 40, "name": "39", "supercategory": "39"}, {"id": 41, "name": "40", "supercategory": "40"}, {"id": 42, "name": "41", "supercategory": "41"}, {"id": 43, "name": "42", "supercategory": "42"}, {"id": 44, "name": "43", "supercategory": "43"}, {"id": 45, "name": "44", "supercategory": "44"}, {"id": 46, "name": "45", "supercategory": "45"}, {"id": 47, "name": "46", "supercategory": "46"}, {"id": 48, "name": "47", "supercategory": "47"}, {"id": 49, "name": "48", "supercategory": "48"}, {"id": 50, "name": "49", "supercategory": "49"}, {"id": 51, "name": "50", "supercategory": "50"}, {"id": 52, "name": "51", "supercategory": "51"}, {"id": 53, "name": "52", "supercategory": "52"}, {"id": 54, "name": "53", "supercategory": "53"}, {"id": 55, "name": "54", "supercategory": "54"}, {"id": 56, "name": "55", "supercategory": "55"}, {"id": 57, "name": "56", "supercategory": "56"}, {"id": 58, "name": "57", "supercategory": "57"}, {"id": 59, "name": "58", "supercategory": "58"}, {"id": 60, "name": "59", "supercategory": "59"}, {"id": 61, "name": "60", "supercategory": "60"}, {"id": 62, "name": "61", "supercategory": "61"}, {"id": 63, "name": "62", "supercategory": "62"}, {"id": 64, "name": "63", "supercategory": "63"}, {"id": 65, "name": "64", "supercategory": "64"}, {"id": 66, "name": "65", "supercategory": "65"}, {"id": 67, "name": "66", "supercategory": "66"}, {"id": 68, "name": "67", "supercategory": "67"}, {"id": 69, "name": "68", "supercategory": "68"}, {"id": 70, "name": "69", "supercategory": "69"}, {"id": 71, "name": "70", "supercategory": "70"}]}
5.配置文件
把exps\default\yolox_s.py复制一份到exps\yolox_s_me.py,改动如下:
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# Copyright (c) Megvii, Inc. and its affiliates.
import os
from yolox.exp import Exp as MyExp
"""
自定义数据直接快速训练注意事项:
1.dataset下的annotations下,写成同coco128一致的标签名,方便直接用。 instances_train2017.json 与 instances_val2017.json
2.训练集和验证集文件夹名称也同样写成train2017和val2017[CocoDataset里直接用的这些]
3.yolox\data\datasets\coco_classes.py 下边的 COCO_CLASSES 写成本次业务数据的标签类
"""
from yolox.data.datasets.coco_classes import COCO_CLASSES # 运行前快速抵达修改类别
DATA_DIR = "datasets/my_data" # 本次训练业务数据
class Exp(MyExp):
def __init__(self):
super(Exp, self).__init__()
self.depth = 0.33
self.width = 0.50
self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]
# Define yourself dataset path
self.data_dir = DATA_DIR
self.train_ann = "instances_train2017.json"
self.val_ann = "instances_val2017.json"
self.num_classes = len(COCO_CLASSES)
self.max_epoch = 50
self.data_num_workers = 4
self.eval_interval = 10 # 多少次迭代评估
self.print_interval = 10 # 多少批次打印
self.save_history_ckpt = False # 只保存最后一次, 如果设置True,则保存每一次
上边代码有两个点,
第一个点:是数据集一些文件名就懒得改了,因为EXP的类中直接用了coco的标签文件名;
第二个点是yolox.data.datasets.coco_classes.py中去加一个COCO_CLASSES作为你的自定义类,方便配置文件中直接加载用!!!如:COCO_CLASSES = ('c1', 'c2')
6.训练
把tools\train.py复制一份出来到根目录,比如my_train.py,里边代码改动如下【上边不动其他的,主函数加上你的配置文件和超参数啥的即可】:
if __name__ == "__main__":
# export CUDA_VISIBLE_DEVICES=1 # 终端执行回车【1卡训练】,注意要大写
configure_module()
args = make_parser().parse_args()
# 训练自己数据 主要是改动如下这四行啥的,训练结果默认在output的py文件夹名下
args.exp_file = "exps/yolox_s_me.py" # 叠加训练100次
args.device = 1 # 用1张卡
args.batch_size = 16
args.ckpt = "checkpoints/yolox_s.pth" # 预训练模型,跟着readme前去下载,还有yolo_l等大模型对应yolox_l的其他大的配置文件
exp = get_exp(args.exp_file, args.name)
exp.merge(args.opts)
check_exp_value(exp)
if not args.experiment_name:
args.experiment_name = exp.exp_name
num_gpu = get_num_devices() if args.devices is None else args.devices
assert num_gpu <= get_num_devices()
if args.cache is not None:
exp.dataset = exp.get_dataset(cache=True, cache_type=args.cache)
dist_url = "auto" if args.dist_url is None else args.dist_url
launch(
main,
num_gpu,
args.num_machines,
args.machine_rank,
backend=args.dist_backend,
dist_url=dist_url,
args=(exp, args),
)
训练完了后,在YOLOX_outputs\yolox_s_me下边有结果,日志啥的,best.pth模型啥的。。。
7.验证
把tools/eval.py复制一份到根目录,叫my_eval.py,简单验证就不动其他内容,就主函数里加配置文件和你的best_ckpt.pth啥的,验的是验证集效果,
if __name__ == "__main__":
"""
用gpu方法:
1.终端运行 export CUDA_VISIBLE_DEVICES=1 表示用gpu为1的卡
2.代码中 args.device = 1 表示用1张卡
"""
# 自定义数据识别
from yolox.data.datasets.coco_classes import COCO_CLASSES # 运行前快速抵达修改类别
args.exp_file = "exps/yolox_s_me.py"
args.ckpt = "YOLOX_outputs/yolox_s_me/best_ckpt.pth"
args.batch_size = 2
args.device = 1
args.conf = 0.5
args.nms = 0.5
args.test = False # 不含测试集的 只验证val集 balabala。。。。。。
exp = get_exp(args.exp_file, args.name)
exp.merge(args.opts)
if not args.experiment_name:
args.experiment_name = exp.exp_name
num_gpu = torch.cuda.device_count() if args.devices is None else args.devices
assert num_gpu <= torch.cuda.device_count()
dist_url = "auto" if args.dist_url is None else args.dist_url
launch(
main,
num_gpu,
args.num_machines,
args.machine_rank,
backend=args.dist_backend,
dist_url=dist_url,
args=(exp, args, num_gpu),
)
8.评估混淆矩阵
目标检测都是map,不好给领导汇报,领导只想看目标对与错了多少,可以转为类似分类的混淆矩阵和精准率,召回率啥的。【注:不做改动的demo.py就是推理图片用的,args.path 为图片路径则推理一张图,若为一个文件夹则推理文件夹中所有图!!!结果依然在YOLOX_output】
下面是我把tools/demo.py复制了一份叫demo_metric.py,加一些操作评估精准率和召回率:
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# Copyright (c) Megvii, Inc. and its affiliates.
import argparse
import json
import os
import time
import numpy as np
from loguru import logger
import cv2
import torch
from yolox.data.data_augment import ValTransform
from yolox.exp import get_exp
from yolox.utils import fuse_model, get_model_info, postprocess, vis
IMAGE_EXT = [".jpg", ".jpeg", ".webp", ".bmp", ".png"]
# Calculate precision and recall for object detection
def calculate_precision_recall(detected_boxes, true_boxes, iou_threshold):
"""
Calculate precision and recall for object detection
:param detected_boxes: list of detected bounding boxes in format [xmin, ymin, xmax, ymax]
:param true_boxes: list of true bounding boxes in format [xmin, ymin, xmax, ymax]
:param iou_threshold: intersection over union threshold for matching detected and true boxes
:return: precision and recall
"""
num_true_boxes = len(true_boxes)
true_positives = 0
for true_box in true_boxes:
max_iou = 0
for detected_box in detected_boxes:
iou = calculate_iou(detected_box, true_box)
if iou > max_iou:
max_iou = iou
if max_iou >= iou_threshold:
true_positives += 1
break
false_positives = len(detected_boxes) - true_positives
false_negatives = num_true_boxes - true_positives
precision = true_positives / (true_positives + false_positives)
recall = true_positives / (true_positives + false_negatives)
# print('TP: ', true_positives)
# print('FP: ', false_positives)
# print('FN: ', false_negatives)
return precision, recall
def calculate_iou(box1, box2):
"""
Calculate intersection over union (IoU) between two bounding boxes
:param box1: bounding box in format [xmin, ymin, xmax, ymax]
:param box2: bounding box in format [xmin, ymin, xmax, ymax]
:return: IoU between box1 and box2
"""
x1 = max(box1[0], box2[0])
y1 = max(box1[1], box2[1])
x2 = min(box1[2], box2[2])
y2 = min(box1[3], box2[3])
intersection = max(0, x2 - x1) * max(0, y2 - y1)
area_box1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
area_box2 = (box2[2] - box2[0]) * (box2[3] - box2[1])
union = area_box1 + area_box2 - intersection
iou = intersection / union
return iou
def make_parser():
parser = argparse.ArgumentParser("YOLOX Demo!")
# parser.add_argument(
# "demo", default="image", help="demo type, eg. image, video and webcam"
# )
parser.add_argument(
"--demo", default="image", help="demo type, eg. image, video and webcam"
)
parser.add_argument("-expn", "--experiment-name", type=str, default=None)
parser.add_argument("-n", "--name", type=str, default=None, help="model name")
parser.add_argument(
"--path", default="./assets/dog.jpg", help="path to images or video"
)
parser.add_argument("--camid", type=int, default=0, help="webcam demo camera id")
parser.add_argument(
"--save_result",
action="store_true",
help="whether to save the inference result of image/video",
)
# exp file
parser.add_argument(
"-f",
"--exp_file",
default=None,
type=str,
help="please input your experiment description file",
)
parser.add_argument("-c", "--ckpt", default=None, type=str, help="ckpt for eval")
parser.add_argument(
"--device",
default="cpu",
type=str,
help="device to run our model, can either be cpu or gpu",
)
parser.add_argument("--conf", default=0.3, type=float, help="test conf")
parser.add_argument("--nms", default=0.3, type=float, help="test nms threshold")
parser.add_argument("--tsize", default=None, type=int, help="test img size")
parser.add_argument(
"--fp16",
dest="fp16",
default=False,
action="store_true",
help="Adopting mix precision evaluating.",
)
parser.add_argument(
"--legacy",
dest="legacy",
default=False,
action="store_true",
help="To be compatible with older versions",
)
parser.add_argument(
"--fuse",
dest="fuse",
default=False,
action="store_true",
help="Fuse conv and bn for testing.",
)
parser.add_argument(
"--trt",
dest="trt",
default=False,
action="store_true",
help="Using TensorRT model for testing.",
)
return parser
def get_image_list(path):
image_names = []
for maindir, subdir, file_name_list in os.walk(path):
for filename in file_name_list:
apath = os.path.join(maindir, filename)
ext = os.path.splitext(apath)[1]
if ext in IMAGE_EXT:
image_names.append(apath)
return image_names
class Predictor(object):
def __init__(
self,
model,
exp,
cls_names,
trt_file=None,
decoder=None,
device="cpu",
fp16=False,
legacy=False,
):
self.model = model
self.cls_names = cls_names
self.decoder = decoder
self.num_classes = exp.num_classes
self.confthre = exp.test_conf
self.nmsthre = exp.nmsthre
self.test_size = exp.test_size
self.device = device
self.fp16 = fp16
self.preproc = ValTransform(legacy=legacy)
if trt_file is not None:
from torch2trt import TRTModule
model_trt = TRTModule()
model_trt.load_state_dict(torch.load(trt_file))
x = torch.ones(1, 3, exp.test_size[0], exp.test_size[1]).cuda()
self.model(x)
self.model = model_trt
def inference(self, img):
img_info = {"id": 0}
if isinstance(img, str):
img_info["file_name"] = os.path.basename(img)
img = cv2.imread(img)
else:
img_info["file_name"] = None
height, width = img.shape[:2]
img_info["height"] = height
img_info["width"] = width
img_info["raw_img"] = img
ratio = min(self.test_size[0] / img.shape[0], self.test_size[1] / img.shape[1])
img_info["ratio"] = ratio
img, _ = self.preproc(img, None, self.test_size)
img = torch.from_numpy(img).unsqueeze(0)
img = img.float()
if self.device == "gpu":
img = img.cuda()
if self.fp16:
img = img.half() # to FP16
with torch.no_grad():
t0 = time.time()
outputs = self.model(img)
if self.decoder is not None:
outputs = self.decoder(outputs, dtype=outputs.type())
outputs = postprocess(
outputs, self.num_classes, self.confthre,
self.nmsthre, class_agnostic=True
) # (x1, y1, x2, y2, obj_conf, class_conf, class_pred) 这个框有对象的概率,对象是车的概率,车的标签代号
logger.info("Infer time: {:.4f}s".format(time.time() - t0))
return outputs, img_info
def visual(self, output, img_info, cls_conf=0.35):
ratio = img_info["ratio"]
img = img_info["raw_img"]
if output is None:
return img
output = output.cpu()
bboxes = output[:, 0:4]
# preprocessing: resize
bboxes /= ratio
cls = output[:, 6]
scores = output[:, 4] * output[:, 5]
vis_res = vis(img, bboxes, scores, cls, cls_conf, self.cls_names)
return vis_res
def image_demo(predictor, path, args):
if os.path.isdir(path):
files = get_image_list(path)
else:
files = [path]
files.sort()
current_time = time.localtime()
# 一个图
outputs, img_info = predictor.inference(files[0])
return outputs, img_info
def imageflow_demo(predictor, vis_folder, current_time, args):
cap = cv2.VideoCapture(args.path if args.demo == "video" else args.camid)
width = cap.get(cv2.CAP_PROP_FRAME_WIDTH) # float
height = cap.get(cv2.CAP_PROP_FRAME_HEIGHT) # float
fps = cap.get(cv2.CAP_PROP_FPS)
if args.save_result:
save_folder = os.path.join(
vis_folder, time.strftime("%Y_%m_%d_%H_%M_%S", current_time)
)
os.makedirs(save_folder, exist_ok=True)
if args.demo == "video":
save_path = os.path.join(save_folder, os.path.basename(args.path))
else:
save_path = os.path.join(save_folder, "camera.mp4")
logger.info(f"video save_path is {save_path}")
vid_writer = cv2.VideoWriter(
save_path, cv2.VideoWriter_fourcc(*"mp4v"), fps, (int(width), int(height))
)
while True:
ret_val, frame = cap.read()
if ret_val:
outputs, img_info = predictor.inference(frame)
result_frame = predictor.visual(outputs[0], img_info, predictor.confthre)
if args.save_result:
vid_writer.write(result_frame)
else:
cv2.namedWindow("yolox", cv2.WINDOW_NORMAL)
cv2.imshow("yolox", result_frame)
ch = cv2.waitKey(1)
if ch == 27 or ch == ord("q") or ch == ord("Q"):
break
else:
break
def main(exp, args, COCO_CLASSES_):
if not args.experiment_name:
args.experiment_name = exp.exp_name
file_name = os.path.join(exp.output_dir, args.experiment_name)
os.makedirs(file_name, exist_ok=True)
vis_folder = None
if args.save_result:
vis_folder = os.path.join(file_name, "vis_res")
os.makedirs(vis_folder, exist_ok=True)
if args.trt:
args.device = "gpu"
logger.info("Args: {}".format(args))
if args.conf is not None:
exp.test_conf = args.conf
if args.nms is not None:
exp.nmsthre = args.nms
if args.tsize is not None:
exp.test_size = (args.tsize, args.tsize)
model = exp.get_model()
logger.info("Model Summary: {}".format(get_model_info(model, exp.test_size)))
if args.device == "gpu":
model.cuda()
if args.fp16:
model.half() # to FP16
model.eval()
if not args.trt:
if args.ckpt is None:
ckpt_file = os.path.join(file_name, "best_ckpt.pth")
else:
ckpt_file = args.ckpt
logger.info("loading checkpoint")
ckpt = torch.load(ckpt_file, map_location="cpu")
# load the model state dict
model.load_state_dict(ckpt["model"])
logger.info("loaded checkpoint done.")
if args.fuse:
logger.info("\tFusing model...")
model = fuse_model(model)
if args.trt:
assert not args.fuse, "TensorRT model is not support model fusing!"
trt_file = os.path.join(file_name, "model_trt.pth")
assert os.path.exists(
trt_file
), "TensorRT model is not found!\n Run python3 tools/trt.py first!"
model.head.decode_in_inference = False
decoder = model.head.decode_outputs
logger.info("Using TensorRT to inference")
else:
trt_file = None
decoder = None
predictor = Predictor(
model=model, exp=exp, cls_names=COCO_CLASSES_, trt_file=trt_file, decoder=decoder,
device=args.device, fp16=args.fp16, legacy=args.legacy,
)
# images 一个图
outputs, img_info = image_demo(predictor, args.path, args)
return outputs, img_info
if __name__ == "__main__":
args = make_parser().parse_args()
# 自定义数据识别
from yolox.data.datasets.coco_classes import COCO_CLASSES # 运行前快速抵达修改类别
args.exp_file = "exps/yolox_s_me.py"
args.ckpt = "YOLOX_outputs/yolox_s_me/ckpt_200.pth"
data_path = "datasets/xxx/val2017"
val_jsons = "datasets/xxx/annotations/instances_val2017.json"
COCO_CLASSES_ = COCO_CLASSES
args.conf = 0.5
args.nms = 0.5
args.device = "cpu" # 转numpy要用
args.save_result = False
# 获取标签的所有框框
with open(val_jsons, "r") as f:
val_labels = json.load(f)
res_id_img = {}
imgs = []
for image_info in val_labels['images']:
res_id_img[image_info['id']] = image_info['file_name']
imgs.append(image_info['file_name'])
bbox_res = {}
for ann in val_labels['annotations']:
bbox_new = [ann['bbox'][0], ann['bbox'][1], ann['bbox'][0]+ann['bbox'][2], ann['bbox'][1]+ann['bbox'][3]]
if res_id_img[ann['image_id']] not in bbox_res.keys():
bbox_res[res_id_img[ann['image_id']]] = [bbox_new]
else:
tmp_list = bbox_res[res_id_img[ann['image_id']]]
tmp_list.append(bbox_new)
bbox_res[res_id_img[ann['image_id']]] = tmp_list
res_all = {}
imgs = list(set(imgs))
# imgs = ["1.jpg"] # 调试单张
for img in imgs:
img_file = os.path.join(data_path, img)
args.path = img_file
# 推理
exp = get_exp(args.exp_file, args.name)
res, img_info = main(exp, args, COCO_CLASSES_)
res_list = np.array(res[0]).tolist()
infer_bbox = [[ii[0]/img_info["ratio"], ii[1]/img_info["ratio"], ii[2]/img_info["ratio"], ii[3]/img_info["ratio"]] for ii in res_list] # img_info中有一个转换率
if img not in res_all.keys():
res_all[img] = [infer_bbox]
else:
tmp = res_all[img]
tmp.append(infer_bbox)
res_all[img] = tmp
precision_all = []
recall_all = []
print()
iou_threshold = 0.3 # 计算精准召回时候iou设定
for img in imgs:
precision, recall = calculate_precision_recall(detected_boxes=res_all[img][0], true_boxes=bbox_res[img], iou_threshold=iou_threshold)
print(img, precision, recall)
precision_all.append(precision)
recall_all.append(recall)
print()
print('验证集平均精度 ', float(sum(precision_all)/len(precision_all)))
print('验证集平均召回 ', float(sum(recall_all)/len(recall_all)))
9.导出onnx
同理,把tools/export_onnx.py复制一份出来,加如下内容即可实现导出onnx,所有结果都在前面说的那个路径下:
args = make_parser().parse_args()
# 主函数下args变量下加下面一些内容:
os.mkdir("YOLOX_onnx")
args.output_name = "YOLOX_onnx/yolox_s_me.onnx"
args.exp_file = "exps/yolox_s_me.py"
args.ckpt = "YOLOX_outputs/yolox_s_me/best_ckpt.pth"
10.onnx推理
主要是数据输入要搞正确,yolox默认640,以及其他一些onnxruntime的格式要正确,onnx推理图片的目标检测如下:
import os
import cv2
import numpy as np
import onnxruntime
from yolox.data.data_augment import preproc as preprocess
from yolox.utils.demo_utils import multiclass_nms, demo_postprocess
from yolox.utils.visualize import vis
"""
yolox_onnx 推理 demo
"""
if __name__ == '__main__':
COCO_CLASSES = ('c1', 'c2', ...)
data_path = "test_datas" # 测试图片的文件夹,里边是一堆图片
model = "/xxx/yolox_s_me.onnx" # 前面导出的onnx模型
output_dir = "/xxx/xxx/result" # 推理结果保存文件夹
score_thr = 0.5
NMS = 0.5
for img in os.listdir(data_path):
image_path = os.path.join(data_path, img)
input_shape = "640, 640"
input_shape = tuple(map(int, input_shape.split(',')))
origin_img = cv2.imread(image_path)
img, ratio = preprocess(origin_img, input_shape)
session = onnxruntime.InferenceSession(model)
ort_inputs = {session.get_inputs()[0].name: img[None, :, :, :]} # ort_input 1 3 640 640
output = session.run(None, ort_inputs) # [1 8400 12]
predictions = demo_postprocess(output[0], input_shape)[0]
boxes = predictions[:, :4]
scores = predictions[:, 4:5] * predictions[:, 5:]
boxes_xyxy = np.ones_like(boxes)
boxes_xyxy[:, 0] = boxes[:, 0] - boxes[:, 2]/2.
boxes_xyxy[:, 1] = boxes[:, 1] - boxes[:, 3]/2.
boxes_xyxy[:, 2] = boxes[:, 0] + boxes[:, 2]/2.
boxes_xyxy[:, 3] = boxes[:, 1] + boxes[:, 3]/2.
boxes_xyxy /= ratio
dets = multiclass_nms(boxes_xyxy, scores, nms_thr=NMS, score_thr=score_thr)
if dets is not None:
final_boxes, final_scores, final_cls_inds = dets[:, :4], dets[:, 4], dets[:, 5]
origin_img = vis(origin_img, final_boxes, final_scores, final_cls_inds,
conf=score_thr, class_names=COCO_CLASSES)
if not os.path.exists(output_dir):
os.makedirs(output_dir)
output_path = os.path.join(output_dir, os.path.basename(image_path))
cv2.imwrite(output_path, origin_img)
print('one img infer ok.')
print('all img infer ok !!!')
-- 补充:docker环境
- dockerFile如下: 从阿里源拉一个torch的基础镜像.........
# https://www.modelscope.cn/docs/环境安装 # GPU环境镜像(python3.10) FROM registry.cn-beijing.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.1.0-py310-torch2.3.0-tf2.16.1-1.15.0 # FROM registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.1.0-py310-torch2.3.0-tf2.16.1-1.15.0 # FROM registry.us-west-1.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.1.0-py310-torch2.3.0-tf2.16.1-1.15.0 RUN mkdir /opt/code WORKDIR /opt/code |
- 构建镜像 docker build -t hxy_base_image .
- 创建容器 docker run --name hxy_mmcls -d -p 9528:22 --shm-size=1g hxy_base_image tail -f /dev/null
【-d 表示后台, -p表示端口映射 --shm-size 表示共享内存分配 tail -f /dev/null表示啥也不干】
- docker run还有些参数,可以酌情添加。
- docker exec -it 容器id /bin/bash: 命令可以进到容器
- docker images, docker ps | grep hezy: 查看镜像和容器等等
针对yolox只需要继续安装一个:pip install loguru 然后应该就能用了。
- 补充:映射ssh等以及如下:
vim /etc/ssh/sshd_config 下边这些设置放开:
Port 22
AddressFamily any
ListenAddress 0.0.0.0
PermitRootLogin yes
PermitEmptyPasswords yes
PasswordAuthentication yes
#重启ssh
service ssh restart
# 设置root密码:passwd root
外边就root/root和IP:端口登录了。【其他shell或者pycharm等idea登录用】