TensorRT部署模型入门(pythonC++)

news2024/11/8 0:37:10

文章目录

    • 1. TensorRT安装
      • 1.1 cuda/cudnn以及虚拟环境的创建
      • 1.2 根据cuda版本安装相对应版本的tensorRT
    • 2. 模型转换
      • 2.1 pth转onnx
      • 2.2 onnx转engine
    • 3. TensorRT部署
      • TensorRT推理(python API)
      • TensorRT推理(C++ API)
    • 可能遇到的问题
    • 参考文献

1. TensorRT安装

1.1 cuda/cudnn以及虚拟环境的创建

CUDA下载链接:https://developer.nvidia.com/cuda-toolkit-archive
cuDnn下载链接:https://docs.nvidia.com/deeplearning/cudnn/latest/installation/windows.html

1.2 根据cuda版本安装相对应版本的tensorRT

TensorRT下载链接:https://developer.nvidia.com/tensorrt/download
在这里插入图片描述
TensorRT安装指南
下载后解压缩,并为文件夹下的lib配置环境变量

2. 模型转换

2.1 pth转onnx

python安装onnx模块,pip install onnx

input_name = 'input'
output_name = 'output'
torch.onnx.export(model,               # model being run
                  x,                   # model input 
                  "model.onnx",        # where to save the model (can be a file or file-like object)                  
                  opset_version=11,          # the ONNX version to export the model to                  
                  input_names = [input_name],   # the model's input names
                  output_names = [output_name],  # the model's output names
                  dynamic_axes= {
                        input_name: {0: 'batch_size', 2 : 'in_width', 3: 'int_height'},
                        output_name: {0: 'batch_size', 2: 'out_width', 3:'out_height'}
                        }

2.2 onnx转engine

注意:TensorRT的ONNX解释器是针对Pytorch版本编译的,如果版本不对应可能导致转模型时出现错误
1. 使用命令行工具

主要是调用bin文件夹下的trtexec执行程序

trtexec.exe --onnx=model.onnx --saveEngine=model.engine --workspace=6000
#生成静态batchsize的engine
./trtexec 	--onnx=<onnx_file> \ 						#指定onnx模型文件
        	--explicitBatch \ 							#在构建引擎时使用显式批大小(默认=隐式)显示批处理
        	--saveEngine=<tensorRT_engine_file> \ 		#输出engine
        	--workspace=<size_in_megabytes> \ 			#设置工作空间大小单位是MB(默认为16MB)
        	--fp16 										#除了fp32之外,还启用fp16精度(默认=禁用)
        
#生成动态batchsize的engine
./trtexec 	--onnx=<onnx_file> \						#指定onnx模型文件
        	--minShapes=input:<shape_of_min_batch> \ 	#最小的NCHW
        	--optShapes=input:<shape_of_opt_batch> \  	#最佳输入维度,跟maxShapes一样就好
        	--maxShapes=input:<shape_of_max_batch> \ 	#最大输入维度
        	--workspace=<size_in_megabytes> \ 			#设置工作空间大小单位是MB(默认为16MB)
        	--saveEngine=<engine_file> \   				#输出engine
        	--fp16   									#除了fp32之外,还启用fp16精度(默认=禁用)


#小尺寸的图片可以多batchsize即8x3x416x416
/home/zxl/TensorRT-7.2.3.4/bin/trtexec  --onnx=yolov4_-1_3_416_416_dynamic.onnx \
                                        --minShapes=input:1x3x416x416 \
                                        --optShapes=input:8x3x416x416 \
                                        --maxShapes=input:8x3x416x416 \
                                        --workspace=4096 \
                                        --saveEngine=yolov4_-1_3_416_416_dynamic_b8_fp16.engine \
                                        --fp16

#由于内存不够了所以改成4x3x608x608
/home/zxl/TensorRT-7.2.3.4/bin/trtexec  --onnx=yolov4_-1_3_608_608_dynamic.onnx \
                                        --minShapes=input:1x3x608x608 \
                                        --optShapes=input:4x3x608x608 \
                                        --maxShapes=input:4x3x608x608 \
                                        --workspace=4096 \
                                        --saveEngine=yolov4_-1_3_608_608_dynamic_b4_fp16.engine \
                                        --fp16           
                                        

另外,可以使用trtexec.exe --help命令查看trtexec的命令参数含义

D:\Work\cuda_gpu\sdk\TensorRT-8.5.1.7\bin>trtexec.exe --help
&&&& RUNNING TensorRT.trtexec [TensorRT v8501] # trtexec.exe --help
=== Model Options ===
  --uff=<file>                UFF model
  --onnx=<file>               ONNX model
  --model=<file>              Caffe model (default = no model, random weights used)
  --deploy=<file>             Caffe prototxt file
  --output=<name>[,<name>]*   Output names (it can be specified multiple times); at least one output is required for UFF and Caffe
  --uffInput=<name>,X,Y,Z     Input blob name and its dimensions (X,Y,Z=C,H,W), it can be specified multiple times; at least one is required for UFF models
  --uffNHWC                   Set if inputs are in the NHWC layout instead of NCHW (use X,Y,Z=H,W,C order in --uffInput)

=== Build Options ===
  --maxBatch                  Set max batch size and build an implicit batch engine (default = same size as --batch)
                              This option should not be used when the input model is ONNX or when dynamic shapes are provided.
  --minShapes=spec            Build with dynamic shapes using a profile with the min shapes provided
  --optShapes=spec            Build with dynamic shapes using a profile with the opt shapes provided
  --maxShapes=spec            Build with dynamic shapes using a profile with the max shapes provided
  --minShapesCalib=spec       Calibrate with dynamic shapes using a profile with the min shapes provided
  --optShapesCalib=spec       Calibrate with dynamic shapes using a profile with the opt shapes provided
  --maxShapesCalib=spec       Calibrate with dynamic shapes using a profile with the max shapes provided
                              Note: All three of min, opt and max shapes must be supplied.
                                    However, if only opt shapes is supplied then it will be expanded so
                                    that min shapes and max shapes are set to the same values as opt shapes.
                                    Input names can be wrapped with escaped single quotes (ex: \'Input:0\').
                              Example input shapes spec: input0:1x3x256x256,input1:1x3x128x128
                              Each input shape is supplied as a key-value pair where key is the input name and
                              value is the dimensions (including the batch dimension) to be used for that input.
                              Each key-value pair has the key and value separated using a colon (:).
                              Multiple input shapes can be provided via comma-separated key-value pairs.
  --inputIOFormats=spec       Type and format of each of the input tensors (default = all inputs in fp32:chw)
                              See --outputIOFormats help for the grammar of type and format list.
                              Note: If this option is specified, please set comma-separated types and formats for all
                                    inputs following the same order as network inputs ID (even if only one input
                                    needs specifying IO format) or set the type and format once for broadcasting.
  --outputIOFormats=spec      Type and format of each of the output tensors (default = all outputs in fp32:chw)
                              Note: If this option is specified, please set comma-separated types and formats for all
                                    outputs following the same order as network outputs ID (even if only one output
                                    needs specifying IO format) or set the type and format once for broadcasting.
                              IO Formats: spec  ::= IOfmt[","spec]
                                          IOfmt ::= type:fmt
                                          type  ::= "fp32"|"fp16"|"int32"|"int8"
                                          fmt   ::= ("chw"|"chw2"|"chw4"|"hwc8"|"chw16"|"chw32"|"dhwc8"|
                                                     "cdhw32"|"hwc"|"dla_linear"|"dla_hwc4")["+"fmt]
  --workspace=N               Set workspace size in MiB.
  --memPoolSize=poolspec      Specify the size constraints of the designated memory pool(s) in MiB.
                              Note: Also accepts decimal sizes, e.g. 0.25MiB. Will be rounded down to the nearest integer bytes.
                              Pool constraint: poolspec ::= poolfmt[","poolspec]
                                               poolfmt ::= pool:sizeInMiB
                                               pool ::= "workspace"|"dlaSRAM"|"dlaLocalDRAM"|"dlaGlobalDRAM"
  --profilingVerbosity=mode   Specify profiling verbosity. mode ::= layer_names_only|detailed|none (default = layer_names_only)
  --minTiming=M               Set the minimum number of iterations used in kernel selection (default = 1)
  --avgTiming=M               Set the number of times averaged in each iteration for kernel selection (default = 8)
  --refit                     Mark the engine as refittable. This will allow the inspection of refittable layers
                              and weights within the engine.
  --sparsity=spec             Control sparsity (default = disabled).
                              Sparsity: spec ::= "disable", "enable", "force"
                              Note: Description about each of these options is as below
                                    disable = do not enable sparse tactics in the builder (this is the default)
                                    enable  = enable sparse tactics in the builder (but these tactics will only be
                                              considered if the weights have the right sparsity pattern)
                                    force   = enable sparse tactics in the builder and force-overwrite the weights to have
                                              a sparsity pattern (even if you loaded a model yourself)
  --noTF32                    Disable tf32 precision (default is to enable tf32, in addition to fp32)
  --fp16                      Enable fp16 precision, in addition to fp32 (default = disabled)
  --int8                      Enable int8 precision, in addition to fp32 (default = disabled)
  --best                      Enable all precisions to achieve the best performance (default = disabled)
  --directIO                  Avoid reformatting at network boundaries. (default = disabled)
  --precisionConstraints=spec Control precision constraint setting. (default = none)
                                  Precision Constaints: spec ::= "none" | "obey" | "prefer"
                                  none = no constraints
                                  prefer = meet precision constraints set by --layerPrecisions/--layerOutputTypes if possible
                                  obey = meet precision constraints set by --layerPrecisions/--layerOutputTypes or fail
                                         otherwise
  --layerPrecisions=spec      Control per-layer precision constraints. Effective only when precisionConstraints is set to
                              "obey" or "prefer". (default = none)
                              The specs are read left-to-right, and later ones override earlier ones. "*" can be used as a
                              layerName to specify the default precision for all the unspecified layers.
                              Per-layer precision spec ::= layerPrecision[","spec]
                                                  layerPrecision ::= layerName":"precision
                                                  precision ::= "fp32"|"fp16"|"int32"|"int8"
  --layerOutputTypes=spec     Control per-layer output type constraints. Effective only when precisionConstraints is set to
                              "obey" or "prefer". (default = none)
                              The specs are read left-to-right, and later ones override earlier ones. "*" can be used as a
                              layerName to specify the default precision for all the unspecified layers. If a layer has more than
                              one output, then multiple types separated by "+" can be provided for this layer.
                              Per-layer output type spec ::= layerOutputTypes[","spec]
                                                    layerOutputTypes ::= layerName":"type
                                                    type ::= "fp32"|"fp16"|"int32"|"int8"["+"type]
  --calib=<file>              Read INT8 calibration cache file
  --safe                      Enable build safety certified engine
  --consistency               Perform consistency checking on safety certified engine
  --restricted                Enable safety scope checking with kSAFETY_SCOPE build flag
  --saveEngine=<file>         Save the serialized engine
  --loadEngine=<file>         Load a serialized engine
  --tacticSources=tactics     Specify the tactics to be used by adding (+) or removing (-) tactics from the default
                              tactic sources (default = all available tactics).
                              Note: Currently only cuDNN, cuBLAS, cuBLAS-LT, and edge mask convolutions are listed as optional
                                    tactics.
                              Tactic Sources: tactics ::= [","tactic]
                                              tactic  ::= (+|-)lib
                                              lib     ::= "CUBLAS"|"CUBLAS_LT"|"CUDNN"|"EDGE_MASK_CONVOLUTIONS"
                                                          |"JIT_CONVOLUTIONS"
                              For example, to disable cudnn and enable cublas: --tacticSources=-CUDNN,+CUBLAS
  --noBuilderCache            Disable timing cache in builder (default is to enable timing cache)
  --heuristic                 Enable tactic selection heuristic in builder (default is to disable the heuristic)
  --timingCacheFile=<file>    Save/load the serialized global timing cache
  --preview=features          Specify preview feature to be used by adding (+) or removing (-) preview features from the default
                              Preview Features: features ::= [","feature]
                                                feature  ::= (+|-)flag
                                                flag     ::= "fasterDynamicShapes0805"
                                                             |"disableExternalTacticSourcesForCore0805"

=== Inference Options ===
  --batch=N                   Set batch size for implicit batch engines (default = 1)
                              This option should not be used when the engine is built from an ONNX model or when dynamic
                              shapes are provided when the engine is built.
  --shapes=spec               Set input shapes for dynamic shapes inference inputs.
                              Note: Input names can be wrapped with escaped single quotes (ex: \'Input:0\').
                              Example input shapes spec: input0:1x3x256x256, input1:1x3x128x128
                              Each input shape is supplied as a key-value pair where key is the input name and
                              value is the dimensions (including the batch dimension) to be used for that input.
                              Each key-value pair has the key and value separated using a colon (:).
                              Multiple input shapes can be provided via comma-separated key-value pairs.
  --loadInputs=spec           Load input values from files (default = generate random inputs). Input names can be wrapped with single quotes (ex: 'Input:0')
                              Input values spec ::= Ival[","spec]
                                           Ival ::= name":"file
  --iterations=N              Run at least N inference iterations (default = 10)
  --warmUp=N                  Run for N milliseconds to warmup before measuring performance (default = 200)
  --duration=N                Run performance measurements for at least N seconds wallclock time (default = 3)
  --sleepTime=N               Delay inference start with a gap of N milliseconds between launch and compute (default = 0)
  --idleTime=N                Sleep N milliseconds between two continuous iterations(default = 0)
  --streams=N                 Instantiate N engines to use concurrently (default = 1)
  --exposeDMA                 Serialize DMA transfers to and from device (default = disabled).
  --noDataTransfers           Disable DMA transfers to and from device (default = enabled).
  --useManagedMemory          Use managed memory instead of separate host and device allocations (default = disabled).
  --useSpinWait               Actively synchronize on GPU events. This option may decrease synchronization time but increase CPU usage and power (default = disabled)
  --threads                   Enable multithreading to drive engines with independent threads or speed up refitting (default = disabled)
  --useCudaGraph              Use CUDA graph to capture engine execution and then launch inference (default = disabled).
                              This flag may be ignored if the graph capture fails.
  --timeDeserialize           Time the amount of time it takes to deserialize the network and exit.
  --timeRefit                 Time the amount of time it takes to refit the engine before inference.
  --separateProfileRun        Do not attach the profiler in the benchmark run; if profiling is enabled, a second profile run will be executed (default = disabled)
  --buildOnly                 Exit after the engine has been built and skip inference perf measurement (default = disabled)
  --persistentCacheRatio      Set the persistentCacheLimit in ratio, 0.5 represent half of max persistent L2 size (default = 0)

=== Build and Inference Batch Options ===
                              When using implicit batch, the max batch size of the engine, if not given,
                              is set to the inference batch size;
                              when using explicit batch, if shapes are specified only for inference, they
                              will be used also as min/opt/max in the build profile; if shapes are
                              specified only for the build, the opt shapes will be used also for inference;
                              if both are specified, they must be compatible; and if explicit batch is
                              enabled but neither is specified, the model must provide complete static
                              dimensions, including batch size, for all inputs
                              Using ONNX models automatically forces explicit batch.

=== Reporting Options ===
  --verbose                   Use verbose logging (default = false)
  --avgRuns=N                 Report performance measurements averaged over N consecutive iterations (default = 10)
  --percentile=P1,P2,P3,...   Report performance for the P1,P2,P3,... percentages (0<=P_i<=100, 0 representing max perf, and 100 representing min perf; (default = 90,95,99%)
  --dumpRefit                 Print the refittable layers and weights from a refittable engine
  --dumpOutput                Print the output tensor(s) of the last inference iteration (default = disabled)
  --dumpProfile               Print profile information per layer (default = disabled)
  --dumpLayerInfo             Print layer information of the engine to console (default = disabled)
  --exportTimes=<file>        Write the timing results in a json file (default = disabled)
  --exportOutput=<file>       Write the output tensors to a json file (default = disabled)
  --exportProfile=<file>      Write the profile information per layer in a json file (default = disabled)
  --exportLayerInfo=<file>    Write the layer information of the engine in a json file (default = disabled)

=== System Options ===
  --device=N                  Select cuda device N (default = 0)
  --useDLACore=N              Select DLA core N for layers that support DLA (default = none)
  --allowGPUFallback          When DLA is enabled, allow GPU fallback for unsupported layers (default = disabled)
  --plugins                   Plugin library (.so) to load (can be specified multiple times)

=== Help ===
  --help, -h                  Print this message

2. 使用TensorRT API 转换成TensorRT engine

def generate_engine(onnx_path, engine_path):
    # 1.构建trt日志记录器
    logger = trt.Logger(trt.Logger.WARNING)
    # 初始化
    trt.init_libnvinfer_plugins(logger, namespace="")

    # 2.create a builder,logger放入进去
    builder = trt.Builder(logger)

    # 3.创建配置文件,用于trt如何优化模型
    config = builder.create_builder_config()
    # 设置工作空间内存大小
    config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 << 20)  # 1 MiB
    # 设置精度
    config.set_flag(trt.BuilderFlag.FP16)
    # INT8需要进行校准

    # 4.创建一个network。EXPLICIT_BATCH:batch是动态的
    network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
    # 创建ONNX模型解析器
    parser = trt.OnnxParser(network, logger)
    # 解析ONNX模型,并填充到网络
    success = parser.parse_from_file(onnx_path)
    # 处理错误
    for idx in range(parser.num_errors):
        print(parser.get_error(idx))
    if not success:
        pass  # Error handling code here

    # 5.engine模型序列化,即生成了trt.engine model
    serialized_engine = builder.build_serialized_network(network, config)
    # 保存序列化的engine,如果以后要用到的话. 模型不能跨平台,即和trt版本 gpu类型有关
    with open(engine_path, "wb") as f:
        f.write(serialized_engine)

    # 6.反序列化engine。使用runtime接口。即加载engine模型进行推理。
    # runtime = trt.Runtime(logger)
    # engine = runtime.deserialize_cuda_engine(serialized_engine)
    # with open("sample.engine", "rb") as f:
    #     serialized_engine = f.read()

完成上述步骤后,将获得一个转换为TensorRT格式的模型文件(model.trt)。可以将该文件用于TensorRT的推理和部署。

3. TensorRT部署

TensorRT部署包括Python和C++ 两种API
在这里插入图片描述

可以使用TensorRT的Python API或C++ API

TensorRT推理(python API)

在安装好tensorrt环境后,可以尝试使用预训练权重进行转化封装部署,运行以下代码!!

import torch
import tensorrt as trt
from collections import OrderedDict, namedtuple
def infer(img_data, engine_path):
    # 1.日志器
    logger = trt.Logger(trt.Logger.INFO)
    # 2.runtime加载trt engine model
    runtime = trt.Runtime(logger)
    trt.init_libnvinfer_plugins(logger, '')  # initialize TensorRT plugins
    with open(engine_path, "rb") as f:
        serialized_engine = f.read()
    engine = runtime.deserialize_cuda_engine(serialized_engine)

    # 3.绑定输入输出
    bindings = OrderedDict()
    Binding = namedtuple('Binding', ('name', 'dtype', 'shape', 'data', 'ptr'))
    fp16 = False
    for index in range(engine.num_bindings):
        name = engine.get_binding_name(index)
        dtype = trt.nptype(engine.get_binding_dtype(index))
        shape = tuple(engine.get_binding_shape(index))
        data = torch.from_numpy(np.empty(shape, dtype=np.dtype(dtype))).to(self.device)
        # Tensor.data_ptr 该tensor首个元素的地址即指针,为int类型
        bindings[name] = Binding(name, dtype, shape, data, int(data.data_ptr()))
        if engine.binding_is_input(index) and dtype == np.float16:
            fp16 = True
    # 记录输入输出的指针地址
    binding_addrs = OrderedDict((n, d.ptr) for n, d in bindings.items())

    # 4.加载数据,绑定数据,并推理,将推理的结果放入到
    context = engine.create_execution_context()
    binding_addrs['images'] = int(img_data.data_ptr())
    context.execute_v2(list(binding_addrs.values()))

    # 5.获取结果((根据导出onnx模型时设置的输入输出名字获取)
    nums = bindings['num'].data[0]
    boxes = bindings['boxes'].data[0]
    scores = bindings['scores'].data[0]
    classes = bindings['classes'].data[0]

TensorRT推理(C++ API)

项目工程设置属性 — 链接器
找到链接器下的输入,在附加依赖项中加入

cudnn.lib
cublas.lib
cudart.lib
nvinfer.lib
nvparsers.lib
nvonnxparser.lib
nvinfer_plugin.lib
opencv_world460d.lib

完整推理代码:

#include <cassert>
#include <cfloat>
#include <fstream>
#include <iostream>
#include <memory>
#include <sstream>

#include <cuda_runtime_api.h>
#include "NvInfer.h"
#include "NvOnnxParser.h"
#include "logger.h"


using sample::gLogError;
using sample::gLogInfo;

using namespace nvinfer1;

// logger用来管控打印日志级别
// TRTLogger继承自nvinfer1::ILogger
class TRTLogger : public nvinfer1::ILogger
{
	void log(Severity severity, const char *msg) noexcept override
	{
		// 屏蔽INFO级别的日志
		if (severity != Severity::kINFO)
			std::cout << msg << std::endl;
	}
} gLogger;
int ReadEngineData(char* enginePath, char *&engineData)
{
	// 读取引擎文件
	std::ifstream engineFile(enginePath, std::ios::binary);
	if (engineFile.fail())
	{
		std::cerr << "Failed to open file!" << std::endl;
		return -1;
	}

	engineFile.seekg(0, std::ifstream::end);
	auto fsize = engineFile.tellg();
	engineFile.seekg(0, std::ifstream::beg);

	if (nullptr == engineData)
	{
		engineData = new char[fsize];
	}

	engineFile.read(engineData, fsize);
	engineFile.close();
	return fsize;
}
size_t getMemorySize(nvinfer1::Dims32 input_dims, int typeSize)
{
	size_t psize = input_dims.d[0] * input_dims.d[1] * input_dims.d[2] * input_dims.d[3] * typeSize;
	return psize;
}
bool inferDemo(float* input_buffer, int* tensorSize)
{
	int batchsize = tensorSize[0];
	int channel = tensorSize[1];
	int width = tensorSize[2];
	int height = tensorSize[3];

	size_t dataSize = width * height*channel*batchsize;

	// 读取引擎文件
	char* enginePath = "net_model.engine";
	char* engineData = nullptr;
	int fsize = ReadEngineData(enginePath, engineData);
	printf("fsize=%d\n", fsize);

	// 创建运行时 & 加载引擎
	// TRTLogger glogger; // 可以使用这个代替sample::gLogger.getTRTLogger()
	std::unique_ptr<nvinfer1::IRuntime> runtime{ nvinfer1::createInferRuntime(sample::gLogger.getTRTLogger()) };
	std::unique_ptr<nvinfer1::ICudaEngine> mEngine(runtime->deserializeCudaEngine(engineData, fsize, nullptr));
	assert(mEngine.get() != nullptr);

	// 创建执行上下文
	std::unique_ptr<nvinfer1::IExecutionContext> context(mEngine->createExecutionContext());
	const char* name0 = mEngine->getBindingName(0);
	const char* name1 = mEngine->getBindingName(1);
	const char* name2 = mEngine->getBindingName(2);
	const char* name3 = mEngine->getBindingName(3);

	printf("name0=%s\nname1=%s\nname2=%s\nname3=%s\n", name0, name1, name2, name3);
	// 获取输入大小
	auto input_idx = mEngine->getBindingIndex("input");
 	if (input_idx == -1)
	{
		return false;
	}
	assert(mEngine->getBindingDataType(input_idx) == nvinfer1::DataType::kFLOAT);
	auto input_dims = context->getBindingDimensions(input_idx);
	context->setBindingDimensions(input_idx, input_dims);
	auto input_size = getMemorySize(input_dims, sizeof(float_t));

	// 获取输出大小 所有输出的空间都要分配 
	auto output1_idx = mEngine->getBindingIndex("output1");
	if (output1_idx == -1)
	{
		return false;
	}
	assert(mEngine->getBindingDataType(output1_idx) == nvinfer1::DataType::kFLOAT);
	auto output1_dims = context->getBindingDimensions(output1_idx);
	auto output1_size = getMemorySize(output1_dims, sizeof(float_t));

	auto output2_idx = mEngine->getBindingIndex("output2");
	if (output2_idx == -1)
	{
		return false;
	}
	assert(mEngine->getBindingDataType(output2_idx) == nvinfer1::DataType::kFLOAT);
	auto output2_dims = context->getBindingDimensions(output2_idx);
	auto output2_size = getMemorySize(output2_dims, sizeof(float_t));

	auto output3_idx = mEngine->getBindingIndex("output3");
	if (output3_idx == -1)
	{
		return false;
	}
	assert(mEngine->getBindingDataType(output3_idx) == nvinfer1::DataType::kFLOAT);
	auto output3_dims = context->getBindingDimensions(output3_idx);
	auto output3_size = getMemorySize(output3_dims, sizeof(float_t));

	// 准备推理
	// Allocate CUDA memory
	void* input_mem{ nullptr };
	if (cudaMalloc(&input_mem, input_size) != cudaSuccess)
	{
		gLogError << "ERROR: input cuda memory allocation failed, size = " << input_size << " bytes" << std::endl;
		return false;
	}

	void* output1_mem{ nullptr };
	if (cudaMalloc(&output1_mem, output1_size) != cudaSuccess)
	{
		gLogError << "ERROR: output cuda memory allocation failed, size = " << output1_size << " bytes" << std::endl;
		return false;
	}
	void* output2_mem{ nullptr };
	if (cudaMalloc(&output2_mem, output2_size) != cudaSuccess)
	{
		gLogError << "ERROR: output cuda memory allocation failed, size = " << output2_size << " bytes" << std::endl;
		return false;
	}
	void* output3_mem{ nullptr };
	if (cudaMalloc(&output3_mem, output3_size) != cudaSuccess)
	{
		gLogError << "ERROR: output cuda memory allocation failed, size = " << output3_size << " bytes" << std::endl;
		return false;
	}

	// 复制数据到设备
	cudaMemcpy(input_mem, input_buffer, input_size, cudaMemcpyHostToDevice); // cudaMemcpyHostToDevice 从主机到设备 即 内存到显存

	// 绑定输入输出内存 一起送入推理
	void* bindings[4];
	bindings[input_idx] = input_mem;
	bindings[output1_idx] = output1_mem;
	bindings[output2_idx] = output2_mem;
	bindings[output3_idx] = output3_mem;

	// 推理
	bool status = context->executeV2(bindings);

	if (!status)
	{
		gLogError << "ERROR: inference failed" << std::endl;
		cudaFree(input_mem);
		cudaFree(output1_mem);
		cudaFree(output2_mem);
		cudaFree(output3_mem);
		return 0;
	}

	// 获得结果
	float* output3_buffer = new float[dataSize];
	cudaMemcpy(output3_buffer, output3_mem, output3_size, cudaMemcpyDeviceToHost);

	// 释放CUDA内存
	cudaFree(input_mem);
	cudaFree(output1_mem);
	cudaFree(output2_mem);
	cudaFree(output3_mem);

	cudaError_t err = cudaGetLastError();
	if (err != cudaSuccess) {
		gLogError << "ERROR: failed to free CUDA memory: " << cudaGetErrorString(err) << std::endl;
		return false;
	}

	// save the results



	delete[] output3_buffer;
	output3_buffer = nullptr;

	return true;
}
int main()
{
	int batchsize = 1;
	int channel = 3;
	int width = 256;
	int height = 256;
	size_t dataSize = width * height*channel*batchsize;
	int tensorSize[4] = { batchsize, channel, width, height };
	float* input_buffer = new float[dataSize];
	for (int i = 0; i < dataSize; i++)
		input_buffer[i] = 0.1;

	inferDemo(input_buffer, tensorSize);

	delete[] input_buffer;
	input_buffer = nullptr;

	system("pause");
	return 0;
}

可能遇到的问题

trt转换时,cudnn库报错:
在这里插入图片描述
安装正确版本的cudnn,另外也需要将tensorRT中lib的dll拷贝到cuda的bin文件下。

trt转换时,缺少 zlibwapi.dll:
Could not locate zlibwapi.dll. Please make sure it is in your library path!
在这里插入图片描述

网上的解决方案有: 在NVIDIA官网下载(不过好像自2023年后下架了);另外就是下载源码进行编译
zlib源码:https://github.com/madler/zlib
而我自己则通过搜索电脑上的zlibwapi.dll(安装的pytorch路径下、钉钉、Origin),将其拷贝到cuda的bin文件下解决了。
在这里插入图片描述
TensorRT推理时,size报错:
如果模型是动态大小导出的,则需要自己设置size。

参考文献

tensorRT基础(1)-实现模型的推理过程
ONNX基本操作
重要,python部署成功 模型部署】TensorRT的安装与使用
有点参考价值 TensorRT部署模型基本步骤(C++)
TensorRT优化部署(一)–TensorRT和ONNX基础
值得参考C++部署 TensorRT基础知识及应用【C++深度学习部署(十)】
windows平台使用tensorRT部署yolov5详细介绍,整个流程思路以及细节。
C ++部署成功 TensorRT Windows C++ 部署 ONNX 模型简易教程
python版tensorrt推理
python TensorRT API转换模型 TensorRT实战:构建与部署Python推理模型(二)
onnx转engine命令 【TensorRT】trtexec工具转engine
trt 使用trtexec工具ONNX转engine

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2072984.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

洛谷 P2254 [NOI2005] 瑰丽华尔兹

题目来源于&#xff1a;洛谷 题目本质&#xff1a;动态规划&#xff0c;单调队列 解题思路&#xff1a; f[i][x][y] max(f[i - 1][x’][y]) dist(x,y,x,y); i表示的是第i个时间段结束后&#xff0c;(x,y)这个位置最长的滑行距离。 注意(x,y)与(x,y)必定是在同一列或同一行…

数据结构之排序(一)

目录 一.排序的概念及其运用 1.1排序的概念 1.2 常见的排序算法 1.3排序的用途 二、排序的原理及实现 2.1插入排序 2.1.1基本思想 &#xff1a; 2.1.2排序过程&#xff1a; ​编辑2.1.3代码实现 2.1.4直接插入排序的特性总结&#xff1a; 2.2希尔排序&#xff08;希尔…

【TB作品】PIC16F1719单片机,EEPROM,PFM,读写,PIC16F1718/19

对于PIC16F1719单片机&#xff0c;没有直接的EEPROM&#xff0c;而是使用高耐久度的程序闪存&#xff08;PFM&#xff09;作为非易失性数据存储区域。这个区域特别适合存储那些需要频繁更新的数据。读写这个内存区域需要操作一些特殊功能寄存器&#xff0c;比如用于地址的PMADR…

Python - sqlparse 解析库的基础使用

安装 首先打开命令行&#xff0c;输入&#xff1a; pip install sqlparse这样就显示已经安装好了 使用 创建一个 Python 项目&#xff0c;导入 sqlparse 包&#xff1a; 1. parse sql "select * from table1 where id 1;"# 1. parse # parse方法将 SQL语句 解析…

全网最适合入门的面向对象编程教程:38 Python常用复合数据类型-使用列表实现堆栈、队列和双端队列

全网最适合入门的面向对象编程教程&#xff1a;38 Python 常用复合数据类型-使用列表实现堆栈、队列和双端队列 摘要&#xff1a; 在 Python 中&#xff0c;列表&#xff08;list&#xff09;是一种非常灵活的数据结构&#xff0c;可以用来实现堆栈&#xff08;stack&#xff…

如何使用ssm实现国学文化网站的设计与制作

TOC ssm187国学文化网站的设计与制作jsp 绪论 1.1 研究背景 当前社会各行业领域竞争压力非常大&#xff0c;随着当前时代的信息化&#xff0c;科学化发展&#xff0c;让社会各行业领域都争相使用新的信息技术&#xff0c;对行业内的各种相关数据进行科学化&#xff0c;规范…

【Kaggle】练习赛《有毒蘑菇的二分类预测》(上)

前言 本篇文章介绍的是Kaggle月赛《Binary Prediction of Poisonous Mushrooms》&#xff0c;即《有毒蘑菇的二分类预测》。与之前练习赛一样&#xff0c;这声比赛也同样适合初学者&#xff0c;但与之前不同的是&#xff0c;本次比赛的数据集有大量的缺失值&#xff0c;如何处…

没有找到c:\windows\system32\msrd3x43.dll。

打开鸭子串口工具&#xff0c;总会出现这个弹窗&#xff1b; 原因&#xff1a;没有以管理员身份运行 解决办法&#xff1a; 1.不用理会它&#xff0c;对串口工具运行没有任何影响。就算你下载了也没用&#xff0c;依然会有提示。 2.或者鼠标右键&#xff0c;以管理员身份运…

go国内源设置

一、背景 部分网络环境不稳定、丢包或无法连外网&#xff0c;在编译go代码时&#xff0c;需要更新相关依赖&#xff0c;可通过设置go国内源地址来更新。 二、国内可用镜像源 2.1 镜像源一 https://goproxy.cn 2.2 镜像源二 https://goproxy.io 2.3 镜像源三 https://gop…

零基础学习Redis(6) -- string类型命令使用

redis中&#xff0c;不同的数据结构有不同的操作命令。 redis中的string是按照二进制存储的&#xff0c;不会对数据做任何编码转换。 1. set / get 命令 为了方便使用&#xff0c;redis提供了多个版本的get / set命令来操作字符串 1. set set key value [expiration EX sec…

NVIDIA将在Hot Chips 2024会议上展示Blackwell服务器装置

NVIDIA 将在 Hot Chips 2024 上展示其 Blackwell 技术堆栈&#xff0c;并在本周末和下周的主要活动中进行会前演示。对于 NVIDIA 发烧友来说&#xff0c;这是一个激动人心的时刻&#xff0c;他们将深入了解NVIDIA的一些最新技术。然而&#xff0c;Blackwell GPU 的潜在延迟可能…

iptables: Chain Already Exists:完美解决方法

iptables: Chain Already Exists&#xff1a;完美解决方法 &#x1f525; iptables: Chain Already Exists&#xff1a;完美解决方法 &#x1f525;摘要引言正文内容 &#x1f4da;什么是 Chain already exists 错误&#xff1f; &#x1f914;常见原因及解决方法 &#x1f52…

排序算法刷题【leetcode88题目:合并两个有序数组、leetcode21:合并两个有序链表】

一、合并两个有序数组 题目比较简单&#xff0c;使用归并排序里面的同样的操作就可以&#xff0c;代码如下所示 #include <iostream> #include <vector> using namespace std;/* leetcode88题&#xff1a;合并两个有序数组 */ class Solution { public:void merge…

九、前端中的异步方法Promise,Promise详解

文章目录 1.Promise简介什么是promise为什么使用Promisepromise中的状态 2.Promis的用法 1.Promise简介 什么是promise Promise是异步编程的一种解决方案&#xff0c;它的构造函数是同步执行的&#xff0c;then 方法是异步执行的。 为什么使用Promise 在JavaScript的世界中…

入门Java编程的知识点—>数组(day05)

重点掌握数组是什么&#xff1f;为什么要使用&#xff1f;如何进行数组的定义&#xff1f; 数组 数组是用来存储同一类型多个元素的存储结构,数组是引用数据类型. 存储同一类型的多个元素如何理解? 生活中: 衣柜→可以存储多个衣服 | 鞋柜→可以存储多个鞋子 | 橱柜→可以存储…

嵌入式Qt移植之tslib部署到Busybox根文件-思维导图-学习笔记-基于正点原子阿尔法开发板

嵌入式Qt移植之tslib部署到Busybox根文件 烧写Busybox根文件系统到开发板 准备好一个固化系统 以TF卡为例子 TF 卡用读卡器插到 Ubuntu 虚拟机 会出现两个分区 boot分区是存放内核和设备树这些 rootfs分区是存放文件系统的 eMMC、NADA FLASH或者其他方式挂载也可&#xf…

windows删除不了的一些长名字文件,为什么python可以删除?

感谢阅读 windows报错截图windows最大文件路径长度限制为什么基于windows系统运行的python可以完成删除文件名259字符的文件&#xff1f;文件系统的存储方式操作系统和文件系统的关系总结 windows报错截图 windows最大文件路径长度限制 但真的是260字符吗&#xff1f;早期windo…

系统分析师4:数据库系统

文章目录 1 数据库体系结构1.1 三级模式和两级映像1.2 分布式数据库1.2.1 分布式数据库概述1.2.2 分布式数据库特点1.2.3 透明性分类 2 数据库设计2.1 数据库设计概述2.2 概念结构设计2.3 逻辑结构设计 3 关系代数3.1 关系代数介绍3.2 典型例题 4 规范化理论4.1 规范化理论基础…

《黑神话·悟空》背后的神秘力量——揭秘游戏服务器架构

✌ 作者名字&#xff1a;高峰君主 &#x1f482; 作者个人网站&#xff1a;高峰君主 - 一个数码科技爱好者 &#x1f4eb; 如果文章知识点有错误的地方&#xff0c;请指正&#xff01;和大家一起学习&#xff0c;一起进步&#x1f440; &#x1f4ac; 人生格言&#xff1a;没有…

“智汇论坛“——基于 Spring 前后端分离版本的论坛系统

一.项目背景 1.项目简介 智汇论坛是一个集高科技与高效交流于一体的在线社区平台&#xff0c;旨在为用户提供一个便捷、智能的讨论空间。通过集成先进的服务器端技术和丰富的浏览器端技术&#xff0c;智汇论坛不仅支持用户之间的实时互动与信息共享&#xff0c;还确保了平台的…