onnxruntim的使用方法

onnxruntime是谁发明的？

ONNX Runtime 是由微软公司开发和维护的深度学习推理框架。ONNX Runtime 的前身是 Microsoft Cognitive Toolkit (CNTK)，它是微软公司开发的一个深度学习框架，支持多种硬件平台和操作系统，具有高性能和易用性的特点。CNTK 最初是微软公司内部使用的深度学习框架，后来在 2016 年开源，成为一个开源项目。

随着深度学习技术的快速发展，越来越多的深度学习框架涌现，CNTK 开始面临着来自 TensorFlow、PyTorch 等框架的激烈竞争。为了更好地适应这种竞争态势，微软公司决定将 CNTK 的核心部分与 ONNX 标准进行整合，推出一个新的深度学习推理框架 ONNX Runtime。ONNX Runtime 继承了 CNTK 的高性能和易用性特点，同时支持 ONNX 标准，可以轻松地与其他深度学习框架进行集成，具有更广泛的适用性。

因此可以说，ONNX Runtime 是由微软公司开发和维护的深度学习推理框架，是基于微软公司的 CNTK 框架和 ONNX 标准整合而成的。

runtimeonnx 和tensorrt openvion的区别是什么？

ONNX Runtime、TensorRT 和 OpenVINO 都是深度学习推理框架，用于在硬件平台上执行深度学习模型的推理操作。它们都具有高性能和易用性的特点，同时也各有特点。

ONNX Runtime 是一个跨平台、高性能的深度学习推理框架，支持多种硬件平台和操作系统，包括 CPU、GPU、FPGA 等。ONNX Runtime 可以加载 ONNX 格式的模型，并在不同的硬件平台上执行推理操作，同时提供了丰富的 API 和工具，方便用户进行模型优化、部署和调试。与 TensorRT 和 OpenVINO 相比，ONNX Runtime 更加通用，可以支持更多的硬件平台和操作系统，同时也具有更高的灵活性和可扩展性。

TensorRT 是英伟达公司推出的一个深度学习推理库，主要用于在英伟达 GPU 上加速深度学习推理。TensorRT 可以执行各种优化，如网络剪枝、量化、层融合等，从而提升推理性能和减小模型大小。TensorRT 与 ONNX Runtime 相比，更加专注于在英伟达 GPU 上的深度学习推理加速，具有更高的性能和更好的可伸缩性。

OpenVINO 是英特尔公司推出的一个深度学习推理框架，主要用于在英特尔 CPU、GPU、VPU 等硬件上执行深度学习推理操作。OpenVINO 可以对深度学习模型进行多种优化，如模型剪枝、量化、分层优化等，从而提高推理性能和减小模型大小。与 ONNX Runtime 和 TensorRT 相比，OpenVINO 更加专注于在英特尔硬件上的深度学习推理加速，具有更高的性能和更好的可伸缩性。

因此，ONNX Runtime、TensorRT 和 OpenVINO 都是优秀的深度学习推理框架，各有其优点和适用场景。用户可以根据自己的需求和硬件平台选择合适的框架进行深度学习推理操作。

什么是onnxruntime？

ONNX Runtime 是一个用于高效推理 ONNX 模型的开源引擎。ONNX 是一种开放的深度学习模型交换格式，可以用于将深度学习模型从一个框架转换到另一个框架，从而实现跨平台和跨框架的模型部署和推理。

ONNX Runtime 提供了一个高度优化的推理引擎，可在多种硬件平台上以低延迟和高吞吐量运行 ONNX 模型。它支持多种编程语言和平台，包括 Python、C++、C#、Java、JavaScript 和 .NET 等，并可以与各种深度学习框架（包括 PyTorch、TensorFlow、Keras、Caffe2、ONNX、OpenVINO 等）无缝集成。

ONNX Runtime 的优点包括：

高效性：ONNX Runtime 提供了一系列高效的优化技术，包括图优化、节点融合、内存重用等，可以在多种硬件平台上以低延迟和高吞吐量运行 ONNX 模型。
跨平台：ONNX Runtime 支持多种硬件平台，包括 CPU、GPU、FPGA、DSP 等，可以在各种设备上进行模型推理。
跨框架：ONNX Runtime 可以与各种深度学习框架无缝集成，可以将模型从一个框架转换到另一个框架，并在各种框架之间进行模型部署和推理。
易用性：ONNX Runtime 提供了一个简单易用的 API，可以轻松加载和运行 ONNX 模型，同时支持多种编程语言和平台。

总之，ONNX Runtime 是一个功能强大、高效、跨平台和跨框架的深度学习模型推理引擎，可以帮助开发者在多种硬件平台和深度学习框架之间轻松部署和推理模型。

如何使用onnxruntime？

使用 ONNX Runtime 进行模型推理需要以下步骤：

安装 ONNX Runtime：首先需要在本地计算机上安装 ONNX Runtime。可以通过 ONNX Runtime 的官方网站（https://www.onnxruntime.ai/）下载适合自己系统的二进制文件，或者使用 pip 命令安装 ONNX Runtime。
加载模型：使用 ONNX Runtime 的 API 可以加载 ONNX 格式的深度学习模型。可以通过 onnxruntime.InferenceSession 类来加载模型。例如：

import onnxruntime
sess = onnxruntime.InferenceSession('model.onnx')

准备输入数据：在进行模型推理之前，需要将输入数据转换成模型需要的格式。可以使用 numpy 数组或者 Python 原生的列表来表示输入数据。例如：

import numpy as np
input_data = np.random.rand(1, 3, 224, 224).astype(np.float32)

进行模型推理：使用 InferenceSession 的 run 方法进行模型推理，并获取输出结果。例如：

output = sess.run(None, {'input': input_data})

其中，第一个参数 None 表示获取所有输出结果，第二个参数是一个字典，指定输入数据的名称和对应的数据。

解析输出结果：根据具体模型的输出结构和含义，可以对输出结果进行解析和处理。输出结果通常是一个或多个 numpy 数组，表示模型对输入数据的预测结果。例如：

output1, output2 = output

需要注意的是，使用 ONNX Runtime 进行模型推理需要对模型进行一定的优化和转换，以便在 ONNX Runtime 上高效地运行。可以使用 ONNX 官方提供的 onnxruntime-tools 工具集进行模型优化和转换。同时，由于 ONNX Runtime 支持多种硬件平台和优化技术，因此在实际使用中需要根据具体场景进行调优和优化。

开源项目

在这里插入图片描述

https://github.com/hpc203/nanodet-plus-opencv
使用OpenCV部署NanoDet-Plus，包含C++和Python两个版本的程序

在本仓库里，分别包含了使用opencv，使用onnxruntime部署NanoDet-Plus目标检测的程序，并且都是包含C++和Python两种版本的源码

ort_session`：ONNX Runtime 的会话

ort_session 是 ONNX Runtime 的会话对象，在 C++ 和 Python 中都可以使用。会话对象是 ONNX Runtime 的核心组件之一，负责加载和执行 ONNX 模型。使用 ort_session 对象，可以将 ONNX 模型加载到内存中，并执行推理操作，得到模型的输出结果。

在 C++ 中，可以通过以下方式创建会话对象：

Ort::SessionOptions session_options;
Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "test");
std::wstring model_path = L"model.onnx";
Ort::Session ort_session(env, model_path.c_str(), session_options);

在 Python 中，可以通过以下方式创建会话对象：

import onnxruntime as ort

model_path = "model.onnx"
ort_session = ort.InferenceSession(model_path)

无论是在 C++ 还是 Python 中，都可以使用会话对象的 Run() 方法执行推理操作，得到模型的输出结果。在执行推理之前，还需要将输入数据转换成 ONNX 模型所需的格式，并将其封装成 Tensor 对象。在 C++ 中，可以使用 Ort::Value 类来封装 Tensor 对象；在 Python 中，可以使用 ort.InferenceSession 对象的 run() 方法来封装 Tensor 对象。

使用 ort_session 对象可以灵活地进行模型推理，包括设置会话参数、加载模型、输入数据转换、推理操作、输出结果解析等。这使得 ONNX Runtime 成为一个非常强大、易用的深度学习推理框架，被广泛应用于各种场景中，如计算机视觉、自然语言处理、语音识别等。

main.cpp

#define _CRT_SECURE_NO_WARNINGS
#include <iostream>
#include <fstream>
#include <string>
#include <math.h>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
//#include <cuda_provider_factory.h>
#include <onnxruntime_cxx_api.h>

using namespace cv;
using namespace std;
using namespace Ort;
/**
这是一个自定义的结构体类型 BoxInfo，用于存储目标检测模型的输出结果中的矩形框信息。该结构体包含以下字段：

- `x1`：矩形框左上角的 x 坐标。
- `y1`：矩形框左上角的 y 坐标。
- `x2`：矩形框右下角的 x 坐标。
- `y2`：矩形框右下角的 y 坐标。
- `score`：矩形框的置信度得分。
- `label`：矩形框所属的目标类别标签。

该结构体可以用于存储目标检测模型的输出结果中的每个矩形框的信息。在实际应用中，可以使用该结构体来方便地处理和显示检测结果。例如，可以定义一个 vector<BoxInfo> 类型的变量来存储所有检测到的矩形框的信息，并使用 OpenCV 的 rectangle 和 putText 函数将其绘制到图像上，以便进行可视化和分析。
**/
typedef struct BoxInfo
{
	float x1;
	float y1;
	float x2;
	float y2;
	float score;
	int label;
} BoxInfo;

/**
这是一个 NanoDet_Plus 类的声明，用于实现目标检测模型的推理功能。类中包含了一些私有成员变量和函数，具体说明如下：

- `score_threshold`：目标检测模型的置信度阈值，默认为 0.5。
- `nms_threshold`：非极大值抑制的阈值，默认为 0.5。
- `class_names`：目标检测模型的类别名称列表。
- `num_class`：目标检测模型的类别数目。
- `input_image_`：输入图像的数据。
- `keep_ratio`：是否保持图像宽高比。
- `inpWidth`：输入图像的宽度。
- `inpHeight`：输入图像的高度。
- `reg_max`：检测框的最大尺寸。
- `num_stages`：检测模型的阶段数目。
- `stride`：每个阶段的步长。
- `mean`：图像均值。
- `stds`：图像标准差。
- `env`：ONNX Runtime 的执行环境。
- `ort_session`：ONNX Runtime 的会话。
- `sessionOptions`：会话选项。
- `input_names`：输入节点的名称。
- `output_names`：输出节点的名称。
- `input_node_dims`：输入节点的维度。
- `output_node_dims`：输出节点的维度。

该类包含了构造函数和 detect 函数。构造函数用于初始化模型，并加载类别名称列表和参数设置。detect 函数用于对输入图像进行目标检测，并返回检测结果。在 detect 函数中，首先将输入图像进行预处理，并通过 ONNX Runtime 进行模型推理，然后将模型输出转换成矩形框的信息，并进行非极大值抑制，最终返回检测结果。
**/
class NanoDet_Plus
{
public:
	NanoDet_Plus(string model_path, string classesFile, float nms_threshold, float objThreshold);
	void detect(Mat& cv_image);
private:
	float score_threshold = 0.5;
	float nms_threshold = 0.5;
	vector<string> class_names;
	int num_class;

	Mat resize_image(Mat srcimg, int *newh, int *neww, int *top, int *left);
	vector<float> input_image_;
	void normalize_(Mat img);
	void softmax_(const float* x, float* y, int length);
	void generate_proposal(vector<BoxInfo>& generate_boxes, const float* preds);
	void nms(vector<BoxInfo>& input_boxes);
	const bool keep_ratio = false;
	int inpWidth;
	int inpHeight;
	int reg_max;
	const int num_stages = 4;
	const int stride[4] = { 8,16,32,64 };
	const float mean[3] = { 103.53, 116.28, 123.675 };
	const float stds[3] = { 57.375, 57.12, 58.395 };

	Env env = Env(ORT_LOGGING_LEVEL_ERROR, "nanodetplus");
	Ort::Session *ort_session = nullptr;
	SessionOptions sessionOptions = SessionOptions();
	vector<char*> input_names;
	vector<char*> output_names;
	vector<vector<int64_t>> input_node_dims; // >=1 outputs
	vector<vector<int64_t>> output_node_dims; // >=1 outputs
};
/**
这是 NanoDet_Plus 类的构造函数实现部分，用于初始化模型，并加载类别名称列表和参数设置。

首先从文件中读取类别名称列表，并初始化类别数目和非极大值抑制阈值和置信度阈值。
然后将模型文件路径转换成 std::wstring 类型，并使用 ONNX Runtime 的 Session 类加载模型。通过 Session 类的 GetInputCount 和 GetOutputCount 方法获取输入节点和输出节点的数目，并分别获取它们的名称和维度信息。
最后根据输入节点的维度信息设置输入图像的宽高和检测框的最大尺寸。

值得注意的是，在这段代码中还注释掉了一个调用 CUDA 执行提供程序的代码行，该代码行用于启用 CUDA 加速。如果要使用 CUDA 加速，需要安装 CUDA 并启用它，同时需要将 ONNX Runtime 编译为支持 CUDA 的版本。
**/
NanoDet_Plus::NanoDet_Plus(string model_path, string classesFile, float nms_threshold, float objThreshold)
{
	ifstream ifs(classesFile.c_str());
	string line;
	while (getline(ifs, line)) this->class_names.push_back(line);
	this->num_class = class_names.size();
	this->nms_threshold = nms_threshold;
	this->score_threshold = objThreshold;

	std::wstring widestr = std::wstring(model_path.begin(), model_path.end());
	//OrtStatus* status = OrtSessionOptionsAppendExecutionProvider_CUDA(sessionOptions, 0);
	sessionOptions.SetGraphOptimizationLevel(ORT_ENABLE_BASIC);
	ort_session = new Session(env, widestr.c_str(), sessionOptions);
	size_t numInputNodes = ort_session->GetInputCount();
	size_t numOutputNodes = ort_session->GetOutputCount();
	AllocatorWithDefaultOptions allocator;
	for (int i = 0; i < numInputNodes; i++)
	{
		input_names.push_back(ort_session->GetInputName(i, allocator));
		Ort::TypeInfo input_type_info = ort_session->GetInputTypeInfo(i);
		auto input_tensor_info = input_type_info.GetTensorTypeAndShapeInfo();
		auto input_dims = input_tensor_info.GetShape();
		input_node_dims.push_back(input_dims);
	}
	for (int i = 0; i < numOutputNodes; i++)
	{
		output_names.push_back(ort_session->GetOutputName(i, allocator));
		Ort::TypeInfo output_type_info = ort_session->GetOutputTypeInfo(i);
		auto output_tensor_info = output_type_info.GetTensorTypeAndShapeInfo();
		auto output_dims = output_tensor_info.GetShape();
		output_node_dims.push_back(output_dims);
		/*for (int j = 0; j < output_dims.size(); j++)
		{
			cout << output_dims[j] << ",";
		}
		cout << endl;*/
	}
	this->inpHeight = input_node_dims[0][2];
	this->inpWidth = input_node_dims[0][3];
	this->reg_max = (output_node_dims[0][output_node_dims[0].size() - 1] - this->num_class) / 4 - 1;
}
/**
这是 NanoDet_Plus 类中的一个私有函数 resize_image，用于将输入图像调整为模型所需的大小。该函数的输入参数为原始图像 Mat 类型，输出参数为调整后的图像的宽高和上下左右边界的偏移量。该函数的返回值为调整后的图像 Mat 类型。

首先获取原始图像的宽高，然后根据 keep_ratio 参数判断是否需要保持宽高比。如果需要保持宽高比，根据原始图像的宽高比计算出调整后的图像的宽高和上下左右边界的偏移量，使用 OpenCV 的 resize 函数将原始图像调整到目标大小，并使用 copyMakeBorder 函数在边缘添加黑边，以保持调整后的图像大小不变。如果不需要保持宽高比，直接使用 resize 函数将原始图像调整到目标大小。

该函数的作用是将输入图像调整为模型所需的大小，以便进行模型推理。调整后的图像可以更好地匹配模型的输入大小，提高检测精度和速度。
**/
Mat NanoDet_Plus::resize_image(Mat srcimg, int *newh, int *neww, int *top, int *left)
{
	int srch = srcimg.rows, srcw = srcimg.cols;
	*newh = this->inpHeight;
	*neww = this->inpWidth;
	Mat dstimg;
	if (this->keep_ratio && srch != srcw) {
		float hw_scale = (float)srch / srcw;
		if (hw_scale > 1) {
			*newh = this->inpHeight;
			*neww = int(this->inpWidth / hw_scale);
			resize(srcimg, dstimg, Size(*neww, *newh), INTER_AREA);
			*left = int((this->inpWidth - *neww) * 0.5);
			copyMakeBorder(dstimg, dstimg, 0, 0, *left, this->inpWidth - *neww - *left, BORDER_CONSTANT, 0);
		}
		else {
			*newh = (int)this->inpHeight * hw_scale;
			*neww = this->inpWidth;
			resize(srcimg, dstimg, Size(*neww, *newh), INTER_AREA);
			*top = (int)(this->inpHeight - *newh) * 0.5;
			copyMakeBorder(dstimg, dstimg, *top, this->inpHeight - *newh - *top, 0, 0, BORDER_CONSTANT, 0);
		}
	}
	else {
		resize(srcimg, dstimg, Size(*neww, *newh), INTER_AREA);
	}
	return dstimg;
}
/**
这是 NanoDet_Plus 类中的一个私有函数 normalize_，用于将输入图像进行归一化。该函数的输入参数为输入图像 Mat 类型，输出参数为归一化后的图像数据，保存在成员变量 input_image_ 中。

首先获取输入图像的宽高和通道数，然后遍历每个像素点，将像素值减去均值，再除以标准差，最终将归一化后的像素值保存在 input_image_ 中。

该函数的作用是将输入图像进行归一化，以便与模型的输入数据相匹配。归一化后的图像数据可以更好地进行模型推理，提高检测精度和速度。
**/
void NanoDet_Plus::normalize_(Mat img)
{
	//    img.convertTo(img, CV_32F);
	int row = img.rows;
	int col = img.cols;
	this->input_image_.resize(row * col * img.channels());
	for (int c = 0; c < 3; c++)
	{
		for (int i = 0; i < row; i++)
		{
			for (int j = 0; j < col; j++)
			{
				float pix = img.ptr<uchar>(i)[j * 3 + c];
				//this->input_image_[c * row * col + i * col + j] = (pix / 255.0 - mean[c] / 255.0) / (stds[c] / 255.0);
				this->input_image_[c * row * col + i * col + j] = (pix - mean[c]) / stds[c];
			}
		}
	}
}
/**
这是 NanoDet_Plus 类中的一个私有函数 softmax_，用于进行 softmax 操作。该函数的输入参数为一个长度为 length 的一维数组 x，输出参数为一个长度为 length 的一维数组 y，保存 softmax 操作后的结果。

首先计算数组 x 中每个元素的指数值，并计算它们的和。然后将每个元素除以指数和，得到 softmax 操作后的结果。

该函数的作用是将模型输出的原始得分值进行归一化，以便进行后续处理。softmax 操作可以将得分值转换成概率分布，方便进行目标检测结果的解析和可视化。
**/
void NanoDet_Plus::softmax_(const float* x, float* y, int length)
{
	float sum = 0;
	int i = 0;
	for (i = 0; i < length; i++)
	{
		y[i] = exp(x[i]);
		sum += y[i];
	}
	for (i = 0; i < length; i++)
	{
		y[i] /= sum;
	}
}
/**
这是 NanoDet_Plus 类中的一个私有函数 generate_proposal，用于根据模型输出的预测结果生成检测框。该函数的输入参数为一个保存模型输出预测结果的一维数组 preds，输出参数为一个保存生成的检测框信息的 vector<BoxInfo> 类型的容器 generate_boxes。

首先根据模型的输出格式，计算出每个预测结果的长度 len，以及每个阶段的步长 stride，以便计算检测框在原图中的位置。然后遍历每个格子，从预测结果中提取出类别得分最高的类别和其得分值，如果得分值大于阈值，则计算出对应的检测框的位置和大小。具体地，首先从预测结果中提取出位置偏移量，然后通过 softmax 操作计算出位置偏移量的概率分布，再将概率分布乘以步长得到实际的位置偏移量。最后根据中心点坐标和位置偏移量计算出检测框的左上角和右下角坐标，并将检测框信息保存到 generate_boxes 中。

该函数的作用是将模型输出的预测结果解析为检测框，以便进行后续的非极大值抑制和结果可视化。
**/

void NanoDet_Plus::generate_proposal(vector<BoxInfo>& generate_boxes, const float* preds)
{
	const int reg_1max = reg_max + 1;
	const int len = this->num_class + 4 * reg_1max;
	for (int n = 0; n < this->num_stages; n++)
	{
		const int stride_ = this->stride[n];
		const int num_grid_y = (int)ceil((float)this->inpHeight / stride_);
		const int num_grid_x = (int)ceil((float)this->inpWidth / stride_);
		cout << "num_grid_x=" << num_grid_x << ",num_grid_y=" << num_grid_y << endl;
		
		for (int i = 0; i < num_grid_y; i++)
		{
			for (int j = 0; j < num_grid_x; j++)
			{
				int max_ind = 0;
				float max_score = 0;
				for (int k = 0; k < num_class; k++)
				{
					if (preds[k] > max_score)
					{
						max_score = preds[k];
						max_ind = k;
					}
				}
				if (max_score >= score_threshold)
				{
					const float* pbox = preds + this->num_class;
					float dis_pred[4];
					float* y = new float[reg_1max];
					for (int k = 0; k < 4; k++)
					{
						softmax_(pbox + k * reg_1max, y, reg_1max);
						float dis = 0.f;
						for (int l = 0; l < reg_1max; l++)
						{
							dis += l * y[l];
						}
						dis_pred[k] = dis * stride_;
					}
					delete[] y;
					/*float pb_cx = (j + 0.5f) * stride_ - 0.5;
					float pb_cy = (i + 0.5f) * stride_ - 0.5;*/
					float pb_cx = j * stride_ ;
					float pb_cy = i * stride_;
					float x0 = pb_cx - dis_pred[0];
					float y0 = pb_cy - dis_pred[1];
					float x1 = pb_cx + dis_pred[2];
					float y1 = pb_cy + dis_pred[3];
					generate_boxes.push_back(BoxInfo{ x0, y0, x1, y1, max_score, max_ind });
				}
				preds += len;
			}
		}
	}
	
}

void NanoDet_Plus::nms(vector<BoxInfo>& input_boxes)
{
	sort(input_boxes.begin(), input_boxes.end(), [](BoxInfo a, BoxInfo b) { return a.score > b.score; });
	vector<float> vArea(input_boxes.size());
	for (int i = 0; i < int(input_boxes.size()); ++i)
	{
		vArea[i] = (input_boxes.at(i).x2 - input_boxes.at(i).x1 + 1)
			* (input_boxes.at(i).y2 - input_boxes.at(i).y1 + 1);
	}

	vector<bool> isSuppressed(input_boxes.size(), false);
	for (int i = 0; i < int(input_boxes.size()); ++i)
	{
		if (isSuppressed[i]) { continue; }
		for (int j = i + 1; j < int(input_boxes.size()); ++j)
		{
			if (isSuppressed[j]) { continue; }
			float xx1 = (max)(input_boxes[i].x1, input_boxes[j].x1);
			float yy1 = (max)(input_boxes[i].y1, input_boxes[j].y1);
			float xx2 = (min)(input_boxes[i].x2, input_boxes[j].x2);
			float yy2 = (min)(input_boxes[i].y2, input_boxes[j].y2);

			float w = (max)(float(0), xx2 - xx1 + 1);
			float h = (max)(float(0), yy2 - yy1 + 1);
			float inter = w * h;
			float ovr = inter / (vArea[i] + vArea[j] - inter);

			if (ovr >= this->nms_threshold)
			{
				isSuppressed[j] = true;
			}
		}
	}
	// return post_nms;
	int idx_t = 0;
	input_boxes.erase(remove_if(input_boxes.begin(), input_boxes.end(), [&idx_t, &isSuppressed](const BoxInfo& f) { return isSuppressed[idx_t++]; }), input_boxes.end());
}

void NanoDet_Plus::detect(Mat& srcimg)
{
	int newh = 0, neww = 0, top = 0, left = 0;
	Mat cv_image = srcimg.clone();
	Mat dst = this->resize_image(cv_image, &newh, &neww, &top, &left);
	this->normalize_(dst);
	array<int64_t, 4> input_shape_{ 1, 3, this->inpHeight, this->inpWidth };

	auto allocator_info = MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU);
	Value input_tensor_ = Value::CreateTensor<float>(allocator_info, input_image_.data(), input_image_.size(), input_shape_.data(), input_shape_.size());

	// ¿ªÊ¼ÍÆÀí
	vector<Value> ort_outputs = ort_session->Run(RunOptions{ nullptr }, &input_names[0], &input_tensor_, 1, output_names.data(), output_names.size());   // ¿ªÊ¼ÍÆÀí
	/generate proposals
	vector<BoxInfo> generate_boxes;
	const float* preds = ort_outputs[0].GetTensorMutableData<float>();
	generate_proposal(generate_boxes, preds);

	 Perform non maximum suppression to eliminate redundant overlapping boxes with
	 lower confidences
	nms(generate_boxes);
	float ratioh = (float)cv_image.rows / newh;
	float ratiow = (float)cv_image.cols / neww;
	for (size_t i = 0; i < generate_boxes.size(); ++i)
	{
		int xmin = (int)max((generate_boxes[i].x1 - left)*ratiow, 0.f);
		int ymin = (int)max((generate_boxes[i].y1 - top)*ratioh, 0.f);
		int xmax = (int)min((generate_boxes[i].x2 - left)*ratiow, (float)cv_image.cols);
		int ymax = (int)min((generate_boxes[i].y2 - top)*ratioh, (float)cv_image.rows);
		rectangle(srcimg, Point(xmin, ymin), Point(xmax, ymax), Scalar(0, 0, 255), 2);
		string label = format("%.2f", generate_boxes[i].score);
		label = this->class_names[generate_boxes[i].label] + ":" + label;
		putText(srcimg, label, Point(xmin, ymin - 5), FONT_HERSHEY_SIMPLEX, 0.75, Scalar(0, 255, 0), 1);
	}
}

int main()
{
	NanoDet_Plus mynet("onnxmodel/nanodet-plus-m_320.onnx", "onnxmodel/coco.names", 0.5, 0.5);     /// choice = ["onnxmodel/nanodet-plus-m_320.onnx", "onnxmodel/nanodet-plus-m_416.onnx",
                                 "onnxmodel/nanodet-plus-m-1.5x_320.onnx", "onnxmodel/nanodet-plus-m-1.5x_416.onnx"]
	string imgpath = "imgs/person.jpg";
	Mat srcimg = imread(imgpath);
	mynet.detect(srcimg);

	static const string kWinName = "Deep learning object detection in ONNXRuntime";
	namedWindow(kWinName, WINDOW_NORMAL);
	imshow(kWinName, srcimg);
	waitKey(0);
	destroyAllWindows();
}

main.py

import cv2
import numpy as np
import argparse
import onnxruntime as ort
import math


class my_nanodet():
    def __init__(self, model_pb_path, label_path, prob_threshold=0.4, iou_threshold=0.3):
        self.classes = list(map(lambda x: x.strip(), open(label_path, 'r').readlines()))
        self.num_classes = len(self.classes)
        self.prob_threshold = prob_threshold
        self.iou_threshold = iou_threshold
        ### normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
        self.mean = np.array([103.53, 116.28, 123.675], dtype=np.float32).reshape(1, 1, 3)
        self.std = np.array([57.375, 57.12, 58.395], dtype=np.float32).reshape(1, 1, 3)
        so = ort.SessionOptions()
        so.log_severity_level = 3
        self.net = ort.InferenceSession(model_pb_path, so)
        self.input_shape = (self.net.get_inputs()[0].shape[2], self.net.get_inputs()[0].shape[3])
        self.reg_max = int((self.net.get_outputs()[0].shape[-1] - self.num_classes) / 4) - 1
        self.project = np.arange(self.reg_max + 1)
        self.strides = (8, 16, 32, 64)
        self.mlvl_anchors = []
        for i in range(len(self.strides)):
            anchors = self._make_grid(
                (math.ceil(self.input_shape[0] / self.strides[i]), math.ceil(self.input_shape[1] / self.strides[i])),
                self.strides[i])
            self.mlvl_anchors.append(anchors)
        self.keep_ratio = False
    
    def _make_grid(self, featmap_size, stride):
        feat_h, feat_w = featmap_size
        shift_x = np.arange(0, feat_w) * stride
        shift_y = np.arange(0, feat_h) * stride
        xv, yv = np.meshgrid(shift_x, shift_y)
        xv = xv.flatten()
        yv = yv.flatten()
        return np.stack((xv, yv), axis=-1)
        # cx = xv + 0.5 * (stride - 1)
        # cy = yv + 0.5 * (stride - 1)
        # return np.stack((cx, cy), axis=-1)
    
    def softmax(self, x, axis=1):
        x_exp = np.exp(x)
        # 如果是列向量，则axis=0
        x_sum = np.sum(x_exp, axis=axis, keepdims=True)
        s = x_exp / x_sum
        return s
    
    def _normalize(self, img):
        img = img.astype(np.float32)
        # img = (img / 255.0 - self.mean / 255.0) / (self.std / 255.0)
        img = (img - self.mean) / (self.std)
        return img
    
    def resize_image(self, srcimg, keep_ratio=True):
        top, left, newh, neww = 0, 0, self.input_shape[0], self.input_shape[1]
        if keep_ratio and srcimg.shape[0] != srcimg.shape[1]:
            hw_scale = srcimg.shape[0] / srcimg.shape[1]
            if hw_scale > 1:
                newh, neww = self.input_shape[0], int(self.input_shape[1] / hw_scale)
                img = cv2.resize(srcimg, (neww, newh), interpolation=cv2.INTER_AREA)
                left = int((self.input_shape[1] - neww) * 0.5)
                img = cv2.copyMakeBorder(img, 0, 0, left, self.input_shape[1] - neww - left, cv2.BORDER_CONSTANT,
                                         value=0)  # add border
            else:
                newh, neww = int(self.input_shape[0] * hw_scale), self.input_shape[1]
                img = cv2.resize(srcimg, (neww, newh), interpolation=cv2.INTER_AREA)
                top = int((self.input_shape[0] - newh) * 0.5)
                img = cv2.copyMakeBorder(img, top, self.input_shape[0] - newh - top, 0, 0, cv2.BORDER_CONSTANT, value=0)
        else:
            img = cv2.resize(srcimg, self.input_shape, interpolation=cv2.INTER_AREA)
        return img, newh, neww, top, left
    
    def post_process(self, preds, scale_factor=1, rescale=False):
        mlvl_bboxes = []
        mlvl_scores = []
        ind = 0
        for stride, anchors in zip(self.strides, self.mlvl_anchors):
            cls_score, bbox_pred = preds[ind:(ind + anchors.shape[0]), :self.num_classes], preds[ind:(ind + anchors.shape[0]), self.num_classes:]
            ind += anchors.shape[0]
            bbox_pred = self.softmax(bbox_pred.reshape(-1, self.reg_max + 1), axis=1)
            # bbox_pred = np.sum(bbox_pred * np.expand_dims(self.project, axis=0), axis=1).reshape((-1, 4))
            bbox_pred = np.dot(bbox_pred, self.project).reshape(-1, 4)
            bbox_pred *= stride
            
            # nms_pre = cfg.get('nms_pre', -1)
            nms_pre = 1000
            if nms_pre > 0 and cls_score.shape[0] > nms_pre:
                max_scores = cls_score.max(axis=1)
                topk_inds = max_scores.argsort()[::-1][0:nms_pre]
                anchors = anchors[topk_inds, :]
                bbox_pred = bbox_pred[topk_inds, :]
                cls_score = cls_score[topk_inds, :]
            
            bboxes = self.distance2bbox(anchors, bbox_pred, max_shape=self.input_shape)
            mlvl_bboxes.append(bboxes)
            mlvl_scores.append(cls_score)
        
        mlvl_bboxes = np.concatenate(mlvl_bboxes, axis=0)
        if rescale:
            mlvl_bboxes /= scale_factor
        mlvl_scores = np.concatenate(mlvl_scores, axis=0)
        
        bboxes_wh = mlvl_bboxes.copy()
        bboxes_wh[:, 2:4] = bboxes_wh[:, 2:4] - bboxes_wh[:, 0:2]  ####xywh
        classIds = np.argmax(mlvl_scores, axis=1)
        confidences = np.max(mlvl_scores, axis=1)  ####max_class_confidence
        
        indices = cv2.dnn.NMSBoxes(bboxes_wh.tolist(), confidences.tolist(), self.prob_threshold,
                                   self.iou_threshold).flatten()
        if len(indices) > 0:
            mlvl_bboxes = mlvl_bboxes[indices]
            confidences = confidences[indices]
            classIds = classIds[indices]
            return mlvl_bboxes, confidences, classIds
        else:
            print('nothing detect')
            return np.array([]), np.array([]), np.array([])
    
    def distance2bbox(self, points, distance, max_shape=None):
        x1 = points[:, 0] - distance[:, 0]
        y1 = points[:, 1] - distance[:, 1]
        x2 = points[:, 0] + distance[:, 2]
        y2 = points[:, 1] + distance[:, 3]
        if max_shape is not None:
            x1 = np.clip(x1, 0, max_shape[1])
            y1 = np.clip(y1, 0, max_shape[0])
            x2 = np.clip(x2, 0, max_shape[1])
            y2 = np.clip(y2, 0, max_shape[0])
        return np.stack([x1, y1, x2, y2], axis=-1)
    
    def detect(self, srcimg):
        img, newh, neww, top, left = self.resize_image(srcimg, keep_ratio=self.keep_ratio)
        img = self._normalize(img)
        blob = np.expand_dims(np.transpose(img, (2, 0, 1)), axis=0)
        
        outs = self.net.run(None, {self.net.get_inputs()[0].name: blob})[0].squeeze(axis=0)
        det_bboxes, det_conf, det_classid = self.post_process(outs)
        
        # results = []
        ratioh, ratiow = srcimg.shape[0] / newh, srcimg.shape[1] / neww
        for i in range(det_bboxes.shape[0]):
            xmin, ymin, xmax, ymax = max(int((det_bboxes[i, 0] - left) * ratiow), 0), max(
                int((det_bboxes[i, 1] - top) * ratioh), 0), min(
                int((det_bboxes[i, 2] - left) * ratiow), srcimg.shape[1]), min(int((det_bboxes[i, 3] - top) * ratioh),
                                                                               srcimg.shape[0])
            # results.append((xmin, ymin, xmax, ymax, self.classes[det_classid[i]], det_conf[i]))
            cv2.rectangle(srcimg, (xmin, ymin), (xmax, ymax), (0, 0, 255), thickness=1)
            print(self.classes[det_classid[i]] + ': ' + str(round(det_conf[i], 3)))
            cv2.putText(srcimg, self.classes[det_classid[i]] + ': ' + str(round(det_conf[i], 3)), (xmin, ymin - 10),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), thickness=1)
        #         cv2.imwrite('result.jpg', srcimg)
        return srcimg


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--imgpath', type=str, default='imgs/person.jpg', help="image path")
    parser.add_argument('--modelpath', type=str, default='onnxmodel/nanodet-plus-m_320.onnx',
                        choices=["onnxmodel/nanodet-plus-m_320.onnx", "onnxmodel/nanodet-plus-m_416.onnx",
                                 "onnxmodel/nanodet-plus-m-1.5x_320.onnx", "onnxmodel/nanodet-plus-m-1.5x_416.onnx"],
                        help="onnx filepath")
    parser.add_argument('--classfile', type=str, default='onnxmodel/coco.names', help="classname filepath")
    parser.add_argument('--confThreshold', default=0.4, type=float, help='class confidence')
    parser.add_argument('--nmsThreshold', default=0.6, type=float, help='nms iou thresh')
    args = parser.parse_args()
    
    srcimg = cv2.imread(args.imgpath)
    net = my_nanodet(args.modelpath, args.classfile, prob_threshold=args.confThreshold, iou_threshold=args.nmsThreshold)
    srcimg = net.detect(srcimg)
    
    winName = 'Deep learning object detection in ONNXRuntime'
    cv2.namedWindow(winName, cv2.WINDOW_NORMAL)
    cv2.imshow(winName, srcimg)
    cv2.waitKey(0)
    cv2.destroyAllWindows()