完整指南:CNStream流处理多路并发框架适配到NVIDIA Jetson Orin (二) 源码架构流程梳理、代码编写

news2024/9/20 16:51:58

目录

1 视频解码代码编写----利用jetson-ffmpeg

1.1 nvstream中视频解码的代码流程框架

1.1.1 类的层次关系

1.1.2 各个类的初始化函数调用层次关系

1.1.3 各个类的process函数调用层次关系

1.2 编写视频解码代码

1.2.1 修改VideoInfo结构体定义

1.2.2 修改解封装代码

1.2.3 decode_impl_nv.hpp

1.2.4 decode_impl_nv.cpp

2 硬件相关的图像格式、内存申请接口、内存释放、内存释放等代码修改

2.1 infer_server/include/common/utils.hpp文件内容修改

 2.2 cuda的四种内存

2.3 infer_server/src/core/device.cpp修改

3 图像缩放、裁剪、色域转换等代码编写----利用CV-CUDA

3.1 nvstream中图像处理代码流程框架

3.1.1 类的层次关系

3.1.2 各个类的初始化函数调用层次关系

3.1.3 各个类的transform函数调用层次关系

3.2 编写图像缩放、裁剪、色域转换等代码

3.2.1 infer_server/src/nv/transform_impl_nv.hpp

3.2.2 infer_server/src/nv/transform_impl_nv.cpp

4 算法推理相关代码修改

4.1 ./infer_server/src/model/model.h

4.2 ./infer_server/src/model/model.cpp

5 其他代码修改

参考文献:


记录下将CNStream流处理多路并发Pipeline框架适配到NVIDIA Jetson AGX Orin的过程,以及过程中遇到的问题,我的jetson盒子是用jetpack5.1.3重新刷机之后的,这是系列博客的第二篇。

另外下面的代码只是初步代码,未编译,未调试。

1 视频解码代码编写----利用jetson-ffmpeg

用jetson-ffmpeg来做视频解码,本质上还是用ffmpeg解码,是不是jetson-ffmpeg底层调用了英伟达的nvmpi做硬件加速。

1.1 nvstream中视频解码的代码流程框架

视频解码的代码流程框架在博客:aclStream流处理多路并发Pipeline框架中 视频解码 代码调用流程整理、类的层次关系整理、回调函数赋值和调用流程整理-CSDN博客

上面博客里面的是详细代码阅读,从中提炼出来简单点的框架

1.1.1 类的层次关系

class FileHandler类里面有个FileHandlerImpl *impl_ = nullptr成员
        -->class FileHandlerImpl
                -->class FFParser 解封装的类
                -->class DeviceDecoder
                       -->class DecodeService
                               -->class IDecoder 只是个虚基类,被用来继承的               
                                       -->IDecoder *decoder_ = CreateDecoder();这里面就是new DecoderAcl()了。 DecoderAcl继承IDecoder 
                                               -->再往下就是各种的硬件相关的类了, 

 上面是类的关系,这次英伟达平台上就是从IDecoder *decoder_ = CreateDecoder();开始写一个新的类然后用来做具体的解码,上层的那些类还是保持不变的。

1.1.2 各个类的初始化函数调用层次关系

FileHandlerImpl::Open()
  std::thread(&FileHandlerImpl::Loop
    PrepareResources()
      parser_.Open
        impl_->Open(url, result, only_key_frame);
          result_->OnParserInfo(info);     
            decoder_->Create(info, &extra)这个decoder_就是DeviceDecoder类
              VdecCreate(void **vdec, VdecCreateParams *params) 这是个单纯的函数,不在任何类里面,函数内容在下一行
                 infer_server::DecodeService::Instance().Create(vdec, params);
                   DecodeService::Create里面有下面两行
                       IDecoder *decoder_ = CreateDecoder()先创建DecoderAcl                    
                       decoder_->Create(params)然后调用DecoderAcl 的create函数,   
                       再往下就是硬件相关的各种类的init和open函数了,这次英伟达的也是要创建个新的类,然后在这里新类的
                       open或者init函数被调用    

1.1.3 各个类的process函数调用层次关系

process分两个方向,一个是从上到下送解码数据的,另一个是解码完之后获取解码数据,然后从下往上回传的。先看从上到下送frame的流程

bool FileHandlerImpl::Process() {
parser_.Parse();
   impl_->Parse();
      result_->OnParserFrame(&frame);
        DeviceDecoder::Process(VideoEsPacket *pkt) {    
          int VdecSendStream(void *vdec, const VdecStream *stream, int timeout_ms)
          {
             return infer_server::DecodeService::Instance().SendStream(vdec, stream, timeout_ms);
           }
              decoder_->SendStream(stream, timeout_ms);这就已经是DecoderAcl的sendstream了,已经到硬件了。
                 vdec_->Decode(data_ptr, data_size, frame_id, this);vdec_是AclLiteVideoProc
                 再往下就不看了,下面几层类都是硬件解码的。

然后看一下从下往上回传的,就是解码之后的数据一层层传到上层的类。

VideoDecoder::DvppVdecCallbackV2(hi_video_frame_info *frame, void *userdata) {
  CallBackVdec(const std::shared_ptr<acllite::ImageData> decoded_image, uint32_t channel_id, uint32_t frame_id, v
     decoder->OnFrame(decoded_image, channel_id, frame_id);这个decoder就是DecoderAcl类,
        create_params_.OnFrame(surf, create_params_.userdata);
            OnFrame_(BufSurface *surf, void *userdata)class DeviceDecoder类
                result_->OnDecodeFrame(wrapper);
                    FileHandlerImpl::OnDecodeFrame

1.2 编写视频解码代码

从上面分析可以知道,需要写一个类,替换掉之前的底层硬件解码类,在jetson上,用jetson-ffmpeg做解码,jetson-ffmpeg会调用硬件加速。

1.2.1 修改VideoInfo结构体定义

首先修改一个结构体的定义,因为之前的cnstream中解码不是用ffmpeg解码的,他只是用ffmpeg解封装,所以这个结构体定义是这样的modules/source/src/video_parser.hpp,他由于不用ffmpeg解码所以不需要 AVCodecParameters* codecpar = nullptr成员。

namespace cnstream {

struct VideoInfo {
  AVCodecID codec_id;
#ifdef HAVE_FFMPEG_AVDEVICE  // for usb camera
  int format;
  int width;
  int height;
#endif
  int progressive;
  std::vector<unsigned char> extra_data;
};

...其他代码...

然后在解码的demo那里,easydk/samples/simple_demo/common/video_parser.h,还有一个结构体的定义。

struct VideoInfo {
  AVCodecID codec_id = AV_CODEC_ID_NONE;
#if LIBAVFORMAT_VERSION_INT >= FFMPEG_VERSION_3_1
  AVCodecParameters* codecpar = nullptr;
#endif
  AVCodecContext* codec_ctx = nullptr;
  std::vector<uint8_t> extra_data{};
  int width = 0;
  int height = 0;
  int progressive = 0;
};

所以我这里要修改一下这个结构体的定义,把第一个的结构体定义改成下面的格式。

namespace cnstream {

struct VideoInfo {
  AVCodecID codec_id;
#if LIBAVFORMAT_VERSION_INT >= FFMPEG_VERSION_3_1
  AVCodecParameters* codecpar = nullptr;
#endif
  AVCodecContext* codec_ctx = nullptr;
  std::vector<unsigned char> extra_data;
  int width;
  int height;
  int progressive;
#ifdef HAVE_FFMPEG_AVDEVICE  // for usb camera
  int format;
#endif
};

1.2.2 修改解封装代码

然后解封装那里的代码,需要修改一下给AVCodecParameters* codecpar成员赋值。参考CNStream中easydk/samples/simple_demo/common/video_parser.cpp的代码去修改NVStream中modules/source/src/video_parser.cpp的代码,

       int Open(const std::string& url, IParserResult* result, bool only_key_frame = false) {
            std::unique_lock<std::mutex> guard(mutex_);
            if (!result) return -1;
            result_ = result;
            // format context
            fmt_ctx_ = avformat_alloc_context();
            if (!fmt_ctx_) {
                return -1;
            }
            url_name_ = url;

            AVInputFormat* ifmt = NULL;
            // for usb camera
#ifdef HAVE_FFMPEG_AVDEVICE
            const char* usb_prefix = "/dev/video";
            if (0 == strncasecmp(url_name_.c_str(), usb_prefix, strlen(usb_prefix))) {
                // open v4l2 input
#if defined(__linux) || defined(__unix)
                ifmt = av_find_input_format("video4linux2");
                if (!ifmt) {
                    LOGE(SOURCE) << "[" << stream_id_ << "]: Could not find v4l2 format.";
                    return false;
                }
#elif defined(_WIN32) || defined(_WIN64)
                ifmt = av_find_input_format("dshow");
                if (!ifmt) {
                    LOGE(SOURCE) << "[" << stream_id_ << "]: Could not find dshow.";
                    return false;
                }
#else
                LOGE(SOURCE) << "[" << stream_id_ << "]: Unsupported Platform";
                return false;
#endif
            }
#endif
            int ret_code;
            const char* p_rtsp_start_str = "rtsp://";
            if (0 == strncasecmp(url_name_.c_str(), p_rtsp_start_str, strlen(p_rtsp_start_str))) {
                AVIOInterruptCB intrpt_callback = { InterruptCallBack, this };
                fmt_ctx_->interrupt_callback = intrpt_callback;
                last_receive_frame_time_ = GetTickCount();
                // options
                av_dict_set(&options_, "buffer_size", "1024000", 0);
                av_dict_set(&options_, "max_delay", "500000", 0);
                av_dict_set(&options_, "stimeout", "20000000", 0);
                av_dict_set(&options_, "rtsp_flags", "prefer_tcp", 0);
                rtsp_source_ = true;
            }
            else {
                // options
                av_dict_set(&options_, "buffer_size", "1024000", 0);
                av_dict_set(&options_, "max_delay", "500000", 0);
            }

            // open input
            ret_code = avformat_open_input(&fmt_ctx_, url_name_.c_str(), ifmt, &options_);
            if (0 != ret_code) {
                LOGI(SOURCE) << "[" << stream_id_ << "]: Couldn't open input stream -- " << url_name_;
                return -1;
            }
            // find video stream information
            ret_code = avformat_find_stream_info(fmt_ctx_, NULL);
            if (ret_code < 0) {
                LOGI(SOURCE) << "[" << stream_id_ << "]: Couldn't find stream information -- " << url_name_;
                return -1;
            }

            // fill info
            VideoInfo video_info;
            VideoInfo* info = &video_info;
            int video_index = -1;
            AVStream* st = nullptr;
            for (uint32_t loop_i = 0; loop_i < fmt_ctx_->nb_streams; loop_i++) {
                st = fmt_ctx_->streams[loop_i];
#if LIBAVFORMAT_VERSION_INT >= FFMPEG_VERSION_3_1
                if (st->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
#else
                if (st->codec->codec_type == AVMEDIA_TYPE_VIDEO) {
#endif
                    video_index = loop_i;
                    break;
                }
            }
            if (video_index == -1) {
                LOGI(SOURCE) << "[" << stream_id_ << "]: Couldn't find a video stream -- " << url_name_;
                return -1;
            }
            video_index_ = video_index;
            
            info->width = st->codecpar->width;
            info->height = st->codecpar->height;

#if LIBAVFORMAT_VERSION_INT >= FFMPEG_VERSION_3_1
            info->codec_id = st->codecpar->codec_id;
            int field_order = st->codecpar->field_order;
#ifdef HAVE_FFMPEG_AVDEVICE  // for usb camera
            info->format = st->codecpar->format;
#endif
#else
            info->codec_id = st->codec->codec_id;
            int field_order = st->codec->field_order;
#ifdef HAVE_FFMPEG_AVDEVICE  // for usb camera
            info->format = st->codec->format;
#endif

#endif

#if LIBAVFORMAT_VERSION_INT >= FFMPEG_VERSION_3_1
            info->codecpar = fmt_ctx_->streams[video_index_]->codecpar;
#endif
            info->codec_ctx = fmt_ctx_->streams[video_index_]->codec;

            /*At this moment, if the demuxer does not set this value (avctx->field_order == UNKNOWN),
             *   the input stream will be assumed as progressive one.
             */
            switch (field_order) {
            case AV_FIELD_TT:
            case AV_FIELD_BB:
            case AV_FIELD_TB:
            case AV_FIELD_BT:
                info->progressive = 0;
                break;
            case AV_FIELD_PROGRESSIVE:  // fall through
            default:
                info->progressive = 1;
                break;
            }

#if LIBAVFORMAT_VERSION_INT >= FFMPEG_VERSION_3_1
            unsigned char* extradata = st->codecpar->extradata;
            int extradata_size = st->codecpar->extradata_size;
#else
            unsigned char* extradata = st->codec->extradata;
            int extradata_size = st->codec->extradata_size;
#endif

            if (extradata && extradata_size) {
                info->extra_data.resize(extradata_size);
                memcpy(info->extra_data.data(), extradata, extradata_size);
            }
            // bitstream filter
            bsf_ctx_ = nullptr;
            const AVBitStreamFilter *pfilter{};
            if (strstr(fmt_ctx_->iformat->name, "mp4") || strstr(fmt_ctx_->iformat->name, "flv") ||
                strstr(fmt_ctx_->iformat->name, "matroska")) {
                if (AV_CODEC_ID_H264 == info->codec_id) {
                    pfilter = av_bsf_get_by_name("h264_mp4toannexb");
                }
                else if (AV_CODEC_ID_HEVC == info->codec_id) {
                    pfilter = av_bsf_get_by_name("hevc_mp4toannexb");
                }
                else {
                    pfilter = nullptr;
                }
            }
            if (pfilter == nullptr) {
                bsf_ctx_ = nullptr;
            }
            else {
                av_bsf_alloc(pfilter, &bsf_ctx_);
            }

            if (result_) {
                result_->OnParserInfo(info);
            }
            av_init_packet(&packet_);
            first_frame_ = true;
            eos_reached_ = false;
            open_success_ = true;
            only_key_frame_ = only_key_frame;
            return 0;
        }

1.2.3 decode_impl_nv.hpp

创建个新的类,用来做视频解码,初步代码如下,后面编译以及运行的时候有错误再修改。

#ifndef _DECODE_IMPL_NV_HPP_
#define _DECODE_IMPL_NV_HPP_

#include <atomic>
#include <string>

#include <libavformat/avformat.h>
#include <libavcodec/avcodec.h>
#include "../decode_impl.hpp"

namespace infer_server {

    class DecodeFFmpeg  : public IDecoder {
    public:
        DecodeFFmpeg()  = default;
        ~DecodeFFmpeg() = default;
        int Create(VdecCreateParams *params) override;
        int Destroy() override;
        int SendStream(const VdecStream *stream, int timeout_ms) override;
        void OnFrame(AVFrame *av_frame_, uint32_t frame_id) override;
        void OnEos() override;
        void OnError(int errcode) override;

    private:
        void ResetFlags();

    private:
        std::atomic<bool> eos_sent_{false};  // flag for acl eos has been sent to decoder
        std::atomic<bool> created_{false};
        VdecCreateParams create_params_;
        AVCodec *av_codec_;
        AVCodecContext* codec_context_{nullptr};
        AVFrame *av_frame_ = nullptr;
        void *transformer_{};
};


}  // namespace 
#endif // DECODE_IMPL_NV_HPP

1.2.4 decode_impl_nv.cpp

#include <iostream>

#include "glog/logging.h"
#include "decode_impl_nv.hpp"
#include "defer.hpp"


namespace infer_server {

    bool DecodeFFmpeg::Create(VdecCreateParams *params) {
        create_params_ = *params;
        
        switch (params->type) {
        case VDEC_TYPE_H264:
            av_codec_ = avcodec_find_decoder(AV_CODEC_ID_H264);
            break;
        case VDEC_TYPE_H265:
            av_codec_ = avcodec_find_decoder(AV_CODEC_ID_H265);
            break;
        case VDEC_TYPE_JPEG:
        default:
            LOG(ERROR) << "[InferServer] [DecodeFFmpeg] Create(): Unsupported codec type: " << create_params_.type;
            return -1;
        }

        if (!av_codec_) {
            LOG(ERROR) << "[InferServer] [DecodeFFmpeg] avcodec_find_decoder failed";
            return false;
        }

        codec_context_ = avcodec_alloc_context3(av_codec_);
        if (!codec_context_) {
            LOG(ERROR) << "[InferServer] [DecodeFFmpeg] Failed to do avcodec_alloc_context3";
            return false;
        }


        AVDictionary *decoder_opts = nullptr;
        defer([&decoder_opts] {
            if (decoder_opts) av_dict_free(&decoder_opts);
        });

        av_dict_set_int(&decoder_opts, "device_id", 0, 0);

        if (avcodec_open2(codec_context_, av_codec_,  &decoder_opts) < 0) {
            LOG(ERROR) << "[InferServer] [DecodeFFmpeg] Failed to open codec";
            return false;
        }
        av_frame_ = av_frame_alloc();
        if (!av_frame_) {
            LOG(ERROR) << "[InferServer] [DecodeFFmpeg] Could not alloc frame";
            return false;
        } 

        ResetFlags();

        //待修改
        //这里还需要增加创建transform用来做图像缩放的代码。

        created_ = true;

        return true;
    }


    int DecodeFFmpeg::SendStream(const VdecStream *stream, int timeout_ms) {

        if (!created_) {
            LOG(ERROR) << "[InferServer] [DecodeFFmpeg] SendStream(): Decoder is not created";
            return -1;
        }

        if (nullptr == stream || nullptr == stream->bits) {
            if (eos_sent_) {
                LOG(WARNING) << "[InferServer] [DecodeFFmpeg] SendStream(): EOS packet has been send";
                return 0;
            }
             
            AVPacket framePacket = {};
            av_init_packet(&framePacket);
        
            framePacket.data = nullptr;
            framePacket.size = 0;
            
            avcodec_send_packet(decode_, &packet);
            // flush all frames ...
            int ret = 0;
            do {
                ret = avcodec_receive_frame(decode_, av_frame_);
                if(ret >= 0)
                {
                    OnFrame(av_frame_, stream->pts);
                }

            } while (ret >= 0);
            eos_sent_ = true;
            OnEos();
        }
        else{
            if (eos_sent_) {
                LOG(ERROR) << "[InferServer] [DecodeFFmpeg] SendStream(): EOS has been sent, process packet failed, pts:"
                    << stream->pts;
                return -1;
            }
            AVPacket framePacket = {};
            av_init_packet(&framePacket);
        
            framePacket.data = stream.bits;
            framePacket.size = stream.len;

            //开始解码
            int ret = avcodec_send_packet(codec_context_, framePacket);
            if (ret < 0) {
                LOG(ERROR) << "[InferServer] [DecodeFFmpeg] avcodec_send_packet failed, data ptr, size:"
                        << framePacket->data << ", " << framePacket->size;
                return false;
            }
            ret = avcodec_receive_frame(codec_context_, av_frame_);
            OnFrame(av_frame_, stream->pts);
        }
    }


    void DecodeFFmpeg::OnFrame(AVFrame *av_frame_, uint32_t frame_id) {

        BufSurface *surf = nullptr;
        if (create_params_.GetBufSurf(&surf, av_frame_->width, av_frame_->height, CastColorFmt(av_frame_->format),
            create_params_.surf_timeout_ms, create_params_.userdata) < 0) {
            LOG(ERROR) << "[InferServer] [DecoderAcl] OnFrame(): Get BufSurface failed";
            OnError(-1);
            return;
        }
        if (surf->mem_type != BUF_MEMORY_DVPP) {
            LOG(ERROR) << "[InferServer] [DecoderAcl] OnFrame(): BufSurface memory type must be BUF_MEMORY_DVPP";
            return;
        }

        switch (av_frame_->format) {
        case acllite::ImageFormat::YUV_SP_420:
        case acllite::ImageFormat::YVU_SP_420:
            if (surf->surface_list[0].width != av_frame_->width || surf->surface_list[0].height != av_frame_->height) {
                BufSurface transform_src;
                BufSurfaceParams src_param;
                memset(&transform_src, 0, sizeof(BufSurface));
                memset(&src_param, 0, sizeof(BufSurfaceParams));

                src_param.color_format = CastColorFmt(av_frame_->format);
                src_param.data_size = codec_image->size;//待修改。
                src_param.data_ptr = reinterpret_cast<void *>(codec_image->data.get());//待修改。

                VLOG(5) << "[InferServer] [DecoderAcl] OnFrame(): codec_frame: "
                    << " width = " << av_frame_->width
                    << ", height = " << av_frame_->height
                    << ", width stride = " << av_frame_->alignWidth
                    << ", height stride = " << av_frame_->alignHeight;
                VLOG(5) << "[InferServer] [DecoderAcl] OnFrame(): surf->surface_list[0]: "
                    << " width = " << surf->surface_list[0].width
                    << ", height = " << surf->surface_list[0].height
                    << ", width stride = " << surf->surface_list[0].width_stride
                    << ", height stride = " << surf->surface_list[0].height_stride;

                src_param.width = av_frame_->width;
                src_param.height = av_frame_->height;
                src_param.width_stride = codec_image->alignWidth;//待修改。
                src_param.height_stride = codec_image->alignHeight;//待修改。

                transform_src.batch_size = 1;
                transform_src.num_filled = 1;
                transform_src.device_id = create_params_.device_id;
                transform_src.mem_type = BUF_MEMORY_DVPP;
                transform_src.surface_list = &src_param;

                TransformParams trans_params;
                memset(&trans_params, 0, sizeof(trans_params));
                trans_params.transform_flag = TRANSFORM_RESIZE_SRC;

                if (Transform(transformer_, &transform_src, surf, &trans_params) < 0) {
                    LOG(ERROR) << "[InferServer] [DecoderAcl] OnFrame(): Transfrom failed";
                    break;
                }
            }
            else {
                std::chrono::high_resolution_clock::time_point tnow = std::chrono::high_resolution_clock::now();
                CALL_ACL_FUNC(acllite::CopyDataToHostEx(surf->surface_list[0].data_ptr, codec_image->size, codec_image->data.get(), codec_image->size, codec_image->deviceId)
                   , "[DecoderAcl] OnFrame(): copy codec buffer data to surf failed");
                             
                std::chrono::high_resolution_clock::time_point tpost = std::chrono::high_resolution_clock::now();
                //std::cout << "<<<<<<================================ CopyDataToHostEx time = " << std::chrono::duration_cast<std::chrono::duration<double>>(tpost - tnow).count() * 1000 << " ms" << std::endl;
   
            }
            break;
        default:
            break;
        }

        surf->pts = frame_id;
        
        //std::chrono::high_resolution_clock::time_point tnow = std::chrono::high_resolution_clock::now();
        create_params_.OnFrame(surf, create_params_.userdata);
        //std::chrono::high_resolution_clock::time_point tpost = std::chrono::high_resolution_clock::now();
        //std::cout << "<<<<<<================================ create_params_.OnFrame time = " << std::chrono::duration_cast<std::chrono::duration<double>>(tpost - tnow).count() * 1000 << " ms" << std::endl;
    }

    void DecodeFFmpeg::Destroy() {

        if (!created_) {
            LOG(WARNING) << "[InferServer] [DecoderAcl] Destroy(): Decoder is not created";
            return 0;
        }
        
        // if error happened, destroy directly, eos maybe not be transmitted from the decoder
        if (!eos_sent_) {
            SendStream(nullptr, 10000);
        }

        ResetFlags();

        if (av_frame_) {
            av_frame_free(&av_frame_);
            av_frame_ = nullptr;
        }
        if (codec_context_) {
            avcodec_close(codec_context_);
            avcodec_free_context(&codec_context_);
            codec_context_ = nullptr;
        }

        //待修改,还需要增加销毁transform的相关代码。
    }


    DecodeFFmpeg::~DecodeFFmpeg() {
        DecodeFFmpeg::Destroy();
    }

    void DecodeFFmpeg::ResetFlags() {
        eos_sent_ = false;
        created_ = false;
    }

    void DecodeFFmpeg::OnEos() {
        create_params_.OnEos(create_params_.userdata);
    }
   
    void DecodeFFmpeg::OnError(int errcode) {
        //convert the error code
        create_params_.OnError(static_cast<int>(errcode), create_params_.userdata);
    }

}  // namespace 

2 硬件相关的图像格式、内存申请接口、内存释放、内存释放等代码修改

infer_server/include/common/utils.hpp文件内容如下

/*************************************************************************
 * Copyright (C) [2022] by Cambricon, Inc. All rights reserved
 *
 *  Licensed under the Apache License, Version 2.0 (the "License");
 *  you may not use this file except in compliance with the License.
 *  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
 * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
 * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 *************************************************************************/

#ifndef COMMON_UTILS_HPP_
#define COMMON_UTILS_HPP_

#include <string>

#include <glog/logging.h>

#include "buf_surface.h"

#include "AclLite/AclLite.h"

#define _SAFECALL(func, expected, msg, ret_val)                                                     \
  do {                                                                                              \
    int _ret = (func);                                                                              \
    if ((expected) != _ret) {                                                                       \
      LOG(ERROR) << "[InferServer] Call [" << #func << "] failed, ret = " << _ret << ". " << msg;        \
      return (ret_val);                                                                             \
        }                                                                                               \
    } while (0)

#define ACL_SAFECALL(func, msg, ret_val) _SAFECALL(func, acllite::ACLLITE_OK, msg, ret_val)

#define _CALLFUNC(func, expected, msg)                                                     \
  do {                                                                                              \
    int _ret = (func);                                                                              \
    if ((expected) != _ret) {                                                                       \
      LOG(ERROR) << "[InferServer] Call [" << #func << "] failed, ret = " << _ret << ". " << msg;        \
                }                                                                                               \
        } while (0)

#define CALL_ACL_FUNC(func, msg) _CALLFUNC(func, acllite::ACLLITE_OK, msg)

inline BufSurfaceMemType CastMemoryType(acllite::MemoryType type) noexcept{
    switch (type) {
#define RETURN_MEMORY_TYPE(type) \
  case acllite::MemoryType::type:         \
    return BUF_##type;
        RETURN_MEMORY_TYPE(MEMORY_HOST)
            RETURN_MEMORY_TYPE(MEMORY_DEVICE)
            RETURN_MEMORY_TYPE(MEMORY_DVPP)
            RETURN_MEMORY_TYPE(MEMORY_NORMAL)
#undef RETURN_MEMORY_TYPE
  default:
      LOG(ERROR) << "[InferServer] CastMemoryType(): Unsupported memory type";
      return BUF_MEMORY_HOST;
    }
}

inline acllite::MemoryType CastMemoryType(BufSurfaceMemType type) noexcept{
    switch (type) {
#define RETURN_MEMORY_TYPE(type)    \
  case BUF_##type: \
    return acllite::MemoryType::type;
        RETURN_MEMORY_TYPE(MEMORY_HOST)
            RETURN_MEMORY_TYPE(MEMORY_DEVICE)
            RETURN_MEMORY_TYPE(MEMORY_DVPP)
            RETURN_MEMORY_TYPE(MEMORY_NORMAL)
#undef RETURN_MEMORY_TYPE
  default:
      LOG(ERROR) << "[InferServer] CastMemoryType(): Unsupported memory type";
      return acllite::MemoryType::MEMORY_HOST;
    }
}

inline BufSurfaceColorFormat CastColorFmt(acllite::ImageFormat format) {
    static std::map<acllite::ImageFormat, BufSurfaceColorFormat> color_map{
        { acllite::ImageFormat::YUV_SP_420, BUF_COLOR_FORMAT_NV12 },
        { acllite::ImageFormat::YVU_SP_420, BUF_COLOR_FORMAT_NV21 },
        { acllite::ImageFormat::RGB_888, BUF_COLOR_FORMAT_RGB },
        { acllite::ImageFormat::BGR_888, BUF_COLOR_FORMAT_BGR },
    };
    return color_map[format];
}

inline acllite::ImageFormat CastColorFmt(BufSurfaceColorFormat format) {
    static std::map<BufSurfaceColorFormat, acllite::ImageFormat> color_map{
        { BUF_COLOR_FORMAT_NV12, acllite::ImageFormat::YUV_SP_420 },
        { BUF_COLOR_FORMAT_NV21, acllite::ImageFormat::YVU_SP_420 },
        { BUF_COLOR_FORMAT_RGB, acllite::ImageFormat::RGB_888 },
        { BUF_COLOR_FORMAT_BGR, acllite::ImageFormat::BGR_888 },
    };
    return color_map[format];
}

#endif  // COMMON_UTILS_HPP_

上面这个文件内容目前是改成了华为acl相关的,现在把整个成功移植到英伟达的Jetson,那么这个文件的内容也要修改,另外,整个工程中所有用到了ACL_SAFECALL和CALL_ACL_FUNC的地方,不仅要把ACL_SAFECALL和CALL_ACL_FUNC这两个名字改掉,还要把用到这两个名字的硬件相关接口全都修改掉,比如

比如这张截图中,所有的这些acllite::AclLiteMalloc   acllite::AclLiteFree acllite::AclLiteMemcpy这些接口都要相应的改成英伟达Jetson平台的接口。

2.1 infer_server/include/common/utils.hpp文件内容修改

/*************************************************************************
 * Copyright (C) [2022] by Cambricon, Inc. All rights reserved
 *
 *  Licensed under the Apache License, Version 2.0 (the "License");
 *  you may not use this file except in compliance with the License.
 *  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
 * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
 * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 *************************************************************************/

#ifndef COMMON_UTILS_HPP_
#define COMMON_UTILS_HPP_

#include <string>

#include <glog/logging.h>

#include "buf_surface.h"

#include "cuda_runtime.h"
#include "nvcv/ImageFormat.h"

#define _SAFECALL(func, expected, msg, ret_val)                                                     \
  do {                                                                                              \
    int _ret = (func);                                                                              \
    if ((expected) != _ret) {                                                                       \
      LOG(ERROR) << "[InferServer] Call [" << #func << "] failed, ret = " << _ret << ". " << msg;        \
      return (ret_val);                                                                             \
        }                                                                                               \
    } while (0)

#define CUDA_SAFECALL(func, msg, ret_val) _SAFECALL(func, cudaSuccess, msg, ret_val)

#define _CALLFUNC(func, expected, msg)                                                     \
  do {                                                                                              \
    int _ret = (func);                                                                              \
    if ((expected) != _ret) {                                                                       \
      LOG(ERROR) << "[InferServer] Call [" << #func << "] failed, ret = " << _ret << ". " << msg;        \
                }                                                                                               \
        } while (0)

#define CALL_CUDA_FUNC(func, msg) _CALLFUNC(func, cudaSuccess, msg)

inline BufSurfaceMemType CastMemoryType(cudaMemoryType type) noexcept{
    switch (type) {
    case cudaMemoryTypeUnregistered:
        return BUF_MEMORY_UNREGISTERED;
    case cudaMemoryTypeHost:
        return BUF_MEMORY_HOST;
    case cudaMemoryTypeDevice:
        return BUF_MEMORY_DEVICE;
    case cudaMemoryTypeManaged:
        return BUF_MEMORY_MANAGED;
  default:
      LOG(ERROR) << "[InferServer] CastMemoryType(): Unsupported memory type";
      return BUF_MEMORY_HOST;
    }
}

inline cudaMemoryType CastMemoryType(BufSurfaceMemType type) noexcept{
    switch (type) {
    case BUF_MEMORY_UNREGISTERED:
        return cudaMemoryTypeUnregistered;
    case BUF_MEMORY_HOST:
        return cudaMemoryTypeHost;
    case BUF_MEMORY_DEVICE:
        return cudaMemoryTypeDevice;
    case BUF_MEMORY_MANAGED:
        return cudaMemoryTypeManaged;
  default:
      LOG(ERROR) << "[InferServer] CastMemoryType(): Unsupported memory type";
      return cudaMemoryTypeHost;
    }
}

inline BufSurfaceColorFormat CastColorFmt(NVCVImageFormat format) {
    static std::map<NVCVImageFormat, BufSurfaceColorFormat> color_map{
        { NVCV_IMAGE_FORMAT_NV12, BUF_COLOR_FORMAT_NV12 },
        { NVCV_IMAGE_FORMAT_NV21, BUF_COLOR_FORMAT_NV21 },
        { NVCV_IMAGE_FORMAT_RGB8, BUF_COLOR_FORMAT_RGB },
        { NVCV_IMAGE_FORMAT_BGR8, BUF_COLOR_FORMAT_BGR },
    };
    return color_map[format];
}

inline NVCVImageFormat CastColorFmt(BufSurfaceColorFormat format) {
    static std::map<BufSurfaceColorFormat, NVCVImageFormat> color_map{
        { BUF_COLOR_FORMAT_NV12, NVCV_IMAGE_FORMAT_NV12 },
        { BUF_COLOR_FORMAT_NV21, NVCV_IMAGE_FORMAT_NV21 },
        { BUF_COLOR_FORMAT_RGB, NVCV_IMAGE_FORMAT_RGB8 },
        { BUF_COLOR_FORMAT_BGR, NVCV_IMAGE_FORMAT_BGR8 },
    };
    return color_map[format];
}

#endif  // COMMON_UTILS_HPP_

 2.2 cuda的四种内存

在CUDA编程中,内存类型是指定数据应该存储在哪种类型的内存中的关键概念。CUDA支持多种内存类型,每种类型都有其特定的用途和性能特点。以下是你提到的几种CUDA内存类型的解释和区别:

  1. cudaMemoryTypeUnregistered

    • 这个枚举值表示内存类型未注册。在CUDA中,通常不会直接使用这个值,因为它表示内存没有被明确指定为主机或设备内存。
  2. cudaMemoryTypeHost

    • 这表示内存是分配在主机(CPU)上的。主机内存可以被CPU直接访问,但GPU访问它通常较慢,因为需要通过PCIe总线进行数据传输。这种内存类型适用于需要CPU频繁访问的数据,或者在CPU和GPU之间需要频繁传输数据的场景。
  3. cudaMemoryTypeDevice

    • 这表示内存是分配在设备(GPU)上的。设备内存只能被GPU直接访问,CPU访问它需要通过CUDA的内存复制操作。这种内存类型适用于GPU计算密集型任务,因为数据已经在GPU上,可以减少数据传输的开销。
  4. cudaMemoryTypeManaged

    • 这表示内存是“统一内存”,即由CUDA统一管理的内存。在这种内存类型下,数据可以同时被CPU和GPU访问,而不需要显式的数据复制操作。CUDA运行时会自动处理数据在主机和设备之间的迁移。这种内存类型简化了内存管理,但可能会引入额外的性能开销,因为运行时需要决定何时以及如何迁移数据。

区别总结:

  • cudaMemoryTypeHost 和 cudaMemoryTypeDevice 提供了明确的内存位置,分别对应CPU和GPU,适用于对性能有明确要求的场景。
  • cudaMemoryTypeManaged 提供了更简单的编程模型,但可能会牺牲一些性能,因为数据迁移是自动和隐式的。
  • cudaMemoryTypeUnregistered 通常不用于实际编程,它更多是一个占位符,表示内存类型尚未确定。

2.3 infer_server/src/core/device.cpp修改

/*************************************************************************
* Copyright (C) [2020] by Cambricon, Inc. All rights reserved
*
*  Licensed under the Apache License, Version 2.0 (the "License");
*  you may not use this file except in compliance with the License.
*  You may obtain a copy of the License at
*
*     http://www.apache.org/licenses/LICENSE-2.0
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*************************************************************************/

#include <glog/logging.h>

#include "nvis/infer_server.h"

namespace infer_server {
    cudaMemcpyKind GetMemcpyKind(BufSurfaceMemType src_mem_type, BufSurfaceMemType dst_mem_type) {
        // 确保未注册内存不会被使用
        assert(src_mem_type != BUF_MEMORY_UNREGISTERED);
        assert(dst_mem_type != BUF_MEMORY_UNREGISTERED);

        // 根据源和目标内存类型确定 cudaMemcpyKind
        if (src_mem_type == BUF_MEMORY_HOST && dst_mem_type == BUF_MEMORY_HOST) {
            return cudaMemcpyHostToHost;
        }
        else if (src_mem_type == BUF_MEMORY_HOST && dst_mem_type == BUF_MEMORY_DEVICE) {
            return cudaMemcpyHostToDevice;
        }
        else if (src_mem_type == BUF_MEMORY_DEVICE && dst_mem_type == BUF_MEMORY_HOST) {
            return cudaMemcpyDeviceToHost;
        }
        else if (src_mem_type == BUF_MEMORY_DEVICE && dst_mem_type == BUF_MEMORY_DEVICE) {
            return cudaMemcpyDeviceToDevice;
        }
        else if (src_mem_type == BUF_MEMORY_MANAGED || dst_mem_type == BUF_MEMORY_MANAGED) {
            // 管理内存可以视为主机或设备内存
            return cudaMemcpyDefault;
        }

        // 默认情况下返回 cudaMemcpyDefault
        return cudaMemcpyDefault;
    }

    bool SetCurrentDevice(int device_id) noexcept{
        CUDA_SAFECALL(cudaSetDevice(device_id), "Set device failed", false);
        VLOG(3) << "[InferServer] SetCurrentDevice(): Set device [" << device_id << "] for this thread";
        return true;
    }

    uint32_t TotalDeviceCount() noexcept{
        uint32_t dev_cnt;
        CUDA_SAFECALL(cudaGetDeviceCount(dev_cnt), "Set device failed", 0);
        return dev_cnt;
    }

    bool CheckDevice(int device_id) noexcept{
        uint32_t dev_cnt;
        CUDA_SAFECALL(cudaGetDeviceCountdev_cnt), "Check device failed", false);
        return device_id < static_cast<int>(dev_cnt) && device_id >= 0;
    }

    void* MallocDeviceMem(size_t size) noexcept{
        void *device_ptr{};
        CUDA_SAFECALL(cudaMallocManaged(&device_ptr, size), "Malloc device memory failed", nullptr);
        return device_ptr;
    }

    int FreeDeviceMem(void *p) noexcept{
        CUDA_SAFECALL(cudaFree(p), "Free device memory failed", -1);
        return 0;
    }

    void* AllocHostMem(size_t size) noexcept{
        void *host_ptr{};
        CUDA_SAFECALL(cudaMallocHost(&host_ptr, size), "Malloc host memory failed", nullptr);
        return host_ptr;
    }

    int FreeHostMem(void *p) noexcept{
        CUDA_SAFECALL(cudaFreeHost(p), "Free host memory failed", -1);
    }

    int MemcpyHD(void* dst, BufSurfaceMemType dst_mem_type, void* src, BufSurfaceMemType src_mem_type, size_t size) noexcept{
        cudaMemcpyKind cpy_type = GetMemcpyKind(src_mem_type, dst_mem_type);
        CUDA_SAFECALL(cudaMemcpy(dst, src, size, cpy_type), "Memcpy HD failed", -1);
        return 0;
    }

    bool IsItegratedGPU(int device_id) {
        static int s_integrated = [device_id]() {
            cudaDeviceProp prop;
            CUDA_SAFECALL(cudaGetDeviceProperties(&prop, device_id));
        };
        return s_integrated == 1;
    }

    int GetCurrentDevice(int& device_id) noexcept{
        CUDA_SAFECALL(cudaGetDevice(&device_id));
        return 0;
    }

}  // namespace infer_server

3 图像缩放、裁剪、色域转换等代码编写----利用CV-CUDA

3.1 nvstream中图像处理代码流程框架

和前面一样,先大体看一下nvstream中图像处理的代码流程框架。

3.1.1 类的层次关系

class TransformService
    class ITransformer 只是个基类,被用来继承的,
    class TransformerAcl : public ITransformer 然后就是硬件处理的了
        AclLiteImageProc

现在应该是直接把TransformerAcl 和 AclLiteImageProc合成一个英伟达的类。

3.1.2 各个类的初始化函数调用层次关系

在视频解码类的create函数里面,或者在算法的Preproc都会有这样一行
if (TransformCreate(&transformer_, &config) != 0)这个transformer_ 是视频解码类或者算法预处理类的一个成员。
    这个create函数是这样的,
    int TransformCreate(void **transformer, TransformConfigParams *params) {
        return infer_server::TransformService::Instance().Create(transformer, params);
    }
    然后TransformService类的create函数里面有         
        ITransformer *transformer_ = CreateTransformer(); CreateTransformer里面就是new TransformerAcl也就是具体的硬件处理类了    
        transformer_->Create(params)
        *transformer = transformer_;回传给解码的那个类或者算法预处理类,

3.1.3 各个类的transform函数调用层次关系

具体硬件解码处理类的OnFrame函数里面会有一个
Transform(transformer_, &transform_src, surf, &trans_params)
    应该是调用了这个
    int Transform(void *transformer, BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
        return infer_server::TransformService::Instance().Transform(transformer, src, dst, transform_params);
    }
        然后到了这里
        int Transform(void *transformer, BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
            if (!dst || !src || !transform_params) {
                LOG(ERROR) << "[InferServer] [TransformService] Transform(): src, dst BufSurface or parameters pointer is invalid";
                return -1;
            }

            ITransformer *transformer_ = static_cast<ITransformer *>(transformer); 那个具体的硬件处理类就是继承的ITransformer 
            return transformer_->Transform(src, dst, transform_params);这就到了具体硬件transform类的函数了,
        }

3.2 编写图像缩放、裁剪、色域转换等代码

3.2.1 infer_server/src/nv/transform_impl_nv.hpp


#ifndef TRANSFORM_IMPL_NV_HPP_
#define TRANSFORM_IMPL_NV_HPP_

#include <algorithm>
#include <cstring>  // for memset
#include <atomic>

#include "../transform_impl.hpp"
#include "transform.h"

namespace infer_server {

    class TransformerNV : public ITransformer {
    public:
        TransformerNV() {

        }
        ~TransformerNV() = default;

        int Create(TransformConfigParams *params) override;
        int Destroy() override;
        int Transform(BufSurface *src, BufSurface *dst, TransformParams *transform_params) override;

    private:
        int DoNVTransform(BufSurface *src, BufSurface *dst, TransformParams *transform_params);
        int NVResize(BufSurface *src, BufSurface *dst, TransformParams *transform_params);
        int NVCrop(BufSurface *src, BufSurface *dst, TransformParams *transform_params);
        int NVCropResize(BufSurface *src, BufSurface *dst, TransformParams *transform_params);
        int NVCropResizePaste(BufSurface *src, BufSurface *dst, TransformParams *transform_params);
        int NVConvertFormat(BufSurface *src, BufSurface *dst, TransformParams *transform_params);

    private:
        TransformConfigParams create_params_;
        cudaStream_t* cu_stream_{nullptr};
        std::shared_ptr<nvcv::ITensor> crop_tensor_;
        std::shared_ptr<nvcv::ITensor> resized_tensor_;
        std::shared_ptr<nvcv::ITensor> cvtcolor_tensor_;
        std::shared_ptr<nvcv::ITensor> copymakeborder_tensor_;

        std::shared_ptr<cvcuda::CustomCrop> crop_op_;    
        std::shared_ptr<cvcuda::Resize> resize_op_;
        std::shared_ptr<cvcuda::CvtColor> cvtcolor_op_;
        std::shared_ptr<cvcuda::CopyMakeBorder> copymakeborder_op_;
        

        std::atomic<bool> created_{};
    };

}  // namespace 

#endif  // TRANSFORM_IMPL_MLU370_HPP_

3.2.2 infer_server/src/nv/transform_impl_nv.cpp



#include "transform_impl_nv.hpp"

#include <algorithm>
#include <atomic>
#include <cstring>  // for memset
#include <map>
#include <memory>
#include <string>
#include <vector>

#include "glog/logging.h"

namespace infer_server {
    int TransformerNV::Create(TransformConfigParams *params) {
        create_params_ = *params;
        
        if(nullptr == cu_stream_)
        {
            cuCreateStream(&cu_stream_, create_params_.device_id);
        }
        
        if (!crop_op_){
            crop_op_ = std::make_shared<cvcuda::CustomCrop>();
        }
        
        if (!cvtcolor_op_){
            cvtcolor_op_ = std::make_shared<cvcuda::CvtColor>();
        }

        if (!resize_op_){
            resize_op_ = std::make_shared<cvcuda::Resize>();
        }

        if (!copymakeborder_op_){
            copymakeborder_op_ = std::make_shared<cvcuda::CopyMakeBorder>();
        }

        created_ = true;
        return 0;
    }

    int TransformerNV::Destroy() {
        if (!created_) {
            LOG(WARNING) << "[InferServer] [TransformerNV] Destroy(): Transformer is not created";
            return 0;
        }

        if (cu_stream_ != nullptr) {
            cuDestroyStream(cu_stream_);
            cu_stream_ = nullptr;
        }

        cvtcolor_op_.reset();
        resize_op_.reset();
        crop_op_.reset();
        copymakeborder_op_.reset();

        return 0;
    }

    int TransformerNV::Transform(BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
        if (!created_) {
            LOG(ERROR) << "[InferServer] [TransformerNV] Transform(): transformer is not created";
            return -1;
        }

        if (src->num_filled > dst->batch_size) {
            LOG(ERROR) << "[InferServer] [TransformerNV] Transform(): The number of inputs exceeds batch size: "
                << src->num_filled << " v.s. " << dst->batch_size;
            return -1;
        }

        if (src->device_id != dst->device_id || src->device_id != create_params_.device_id) {
            LOG(ERROR) << "[InferServer] [TransformerNV] Transform(): The device id of src, dst and transformer is not the same: src device: " << src->device_id
                << " , dst device: " << dst->device_id
                << " , transformer device: " << create_params_.device_id;
            return -1;
        }

        if (src->mem_type != BUF_MEMORY_DVPP) {
            LOG(ERROR) << "[InferServer] [TransformerNV] Transform(): The src and dst mem_type must be BUF_MEMORY_DVPP";
            return -1;
        }

        if (src->surface_list[0].data_size == 0) {
            LOG(ERROR) << "[InferServer] [TransformerNV] Transform(): Input data size is 0";
            return -1;
        }

        return DoNVTransform(src, dst, transform_params);
    }

    int TransformerNV::DoNVTransform(BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
        switch (transform_params->transform_flag) {
        case TRANSFORM_RESIZE_SRC:
            return NVResize(src, dst, transform_params);
            break;
        case TRANSFORM_CROP_SRC:
            return NVCrop(src, dst, transform_params);
            break;
        case TRANSFORM_CROP_RESIZE_SRC:
            return NVCropResize(src, dst, transform_params);
            break;
        case TRANSFORM_CROP_RESIZE_PASTE_SRC:
            return NVCropResizePaste(src, dst, transform_params);
            break;
        default:
            LOG(ERROR) << "[InferServer] [TransformerNV] Transform(): Transform flag not supported currently, flag = " << transform_params->transform_flag;
            return -1;
        }
    }

    int TransformerNV::NVResize(BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
        if (src->batch_size != 1 || dst->batch_size != 1) {
            LOG(ERROR) << "[InferServer] [TransformerNV] Transform(): NVResize now only support src/dst one image, src batch size = " << src->batch_size <<" , dst batch size = " << dst->batch_size;
            return -1;
        }

        auto& src_surf = src->surface_list[0];

        nvcv::Tensor::Requirements in_reqs = nvcv::Tensor::CalcRequirements(1, {src_surf.width_stride, src_surf.height}, nvcv::FMT_BGR8);
        nvcv::TensorDataStridedCuda::Buffer in_buf;
        std::copy(in_reqs.strides, in_reqs.strides + NVCV_TENSOR_MAX_RANK, in_buf.strides);
        in_buf.basePtr = reinterpret_cast<NVCVByte *>(src->data_ptr);

        nvcv::TensorDataStridedCuda in_data(nvcv::TensorShape{in_reqs.shape, in_reqs.rank, in_reqs.layout}, nvcv::DataType{in_reqs.dtype}, in_buf);
        nvcv::TensorWrapData in_tensor(in_data);
    
        if (!resized_tensor_){
            resized_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, {dst.width_stride, src_surf.height}, nvcv::FMT_BGR8));
        }

        (*resize_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), in_tensor, *resized_tensor_, NVCV_INTERP_LINEAR);

        auto& dst_surf = dst->surface_list[0];
        auto out_data = resized_tensor_->exportData<nvcv::TensorDataStridedCuda>();

        cudaMemcpyAsync(dst_surf.data_ptr, (const unsigned char *)out_data->basePtr(), dst_surf.data_size, cudaMemcpyDeviceToHost);
        cuStreamSynchronize(cu_stream_);

        resized_tensor_.reset();
        
        return 0;
    }

    int TransformerNV::NVCrop(BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
        auto& src_surf = src->surface_list[0];
        auto& dst_surf = dst->surface_list[0];

        nvcv::Tensor::Requirements in_reqs = nvcv::Tensor::CalcRequirements(1, {src_surf.width_stride, src_surf.height}, nvcv::FMT_BGR8);
        nvcv::TensorDataStridedCuda::Buffer in_buf;
        std::copy(in_reqs.strides, in_reqs.strides + NVCV_TENSOR_MAX_RANK, in_buf.strides);
        in_buf.basePtr = reinterpret_cast<NVCVByte *>(src->data_ptr);

        nvcv::TensorDataStridedCuda in_data(nvcv::TensorShape{in_reqs.shape, in_reqs.rank, in_reqs.layout}, nvcv::DataType{in_reqs.dtype}, in_buf);
        nvcv::TensorWrapData in_tensor(in_data);
    
        if (!crop_tensor_){
            crop_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, {dst_surf.width, dst_surf.height}, nvcv::FMT_BGR8));
        }
        
        TransformRect rect = transform_params->src_rect[0];
        rect.left = rect.left >= src_surf.width ? 0 : rect.left;
        rect.top = rect.top >= src_surf.height ? 0 : rect.top;
        rect.width = rect.width <= 0 ? (src_surf.width - rect.left) : rect.width;
        rect.height = rect.height <= 0 ? (src_surf.height - rect.top) : rect.height;
        NVCVRectI crpRect = { rect.left, rect.top, rect.left + rect.width - 1, rect.top + rect.height - 1 };

        (*crop_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), in_tensor, *crop_tensor_, crpRect);


        auto out_data = crop_tensor_->exportData<nvcv::TensorDataStridedCuda>();

        cudaMemcpyAsync(dst_surf.data_ptr, (const unsigned char *)out_data->basePtr(), dst_surf.data_size, cudaMemcpyDeviceToHost);
        cuStreamSynchronize(cu_stream_);

        crop_tensor_.reset();
        
        return 0;
    }

    int TransformerNV::NVCropResize(BufSurface *src, BufSurface *dst, TransformParams *transform_params) {

        auto& src_surf = src->surface_list[0];
        auto& dst_surf = dst->surface_list[0];

        nvcv::Tensor::Requirements in_reqs = nvcv::Tensor::CalcRequirements(1, {src_surf.width_stride, src_surf.height}, nvcv::FMT_BGR8);
        nvcv::TensorDataStridedCuda::Buffer in_buf;
        std::copy(in_reqs.strides, in_reqs.strides + NVCV_TENSOR_MAX_RANK, in_buf.strides);
        in_buf.basePtr = reinterpret_cast<NVCVByte *>(src->data_ptr);

        nvcv::TensorDataStridedCuda in_data(nvcv::TensorShape{in_reqs.shape, in_reqs.rank, in_reqs.layout}, nvcv::DataType{in_reqs.dtype}, in_buf);
        nvcv::TensorWrapData in_tensor(in_data);

        
        TransformRect rect = transform_params->src_rect[0];
        rect.left = rect.left >= src_surf.width ? 0 : rect.left;
        rect.top = rect.top >= src_surf.height ? 0 : rect.top;
        rect.width = rect.width <= 0 ? (src_surf.width - rect.left) : rect.width;
        rect.height = rect.height <= 0 ? (src_surf.height - rect.top) : rect.height;
        NVCVRectI crpRect = { rect.left, rect.top, rect.left + rect.width - 1, rect.top + rect.height - 1 };
            
        if (!crop_tensor_){
            crop_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, {rect.width, rect.height}, nvcv::FMT_BGR8));
        }

        (*crop_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), in_tensor, *crop_tensor_, crpRect);
        
        if (!resized_tensor_){
            resized_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, { dst_surf.width, dst_surf.height }, nvcv::FMT_BGR8));
        }

        (*resize_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), *crop_tensor_, *resized_tensor_, NVCV_INTERP_LINEAR);

        auto out_data = resized_tensor_->exportData<nvcv::TensorDataStridedCuda>();

        cudaMemcpyAsync(dst_surf.data_ptr, (const unsigned char *)out_data->basePtr(), dst_surf.data_size, cudaMemcpyDeviceToHost);
        cuStreamSynchronize(cu_stream_);

        crop_tensor_.reset();
        resized_tensor_.reset();

        return 0;
    }

    int TransformerNV::NVCropResizePaste(BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
        auto& src_surf = src->surface_list[0];
        auto& dst_surf = dst->surface_list[0];

        nvcv::Tensor::Requirements in_reqs = nvcv::Tensor::CalcRequirements(1, {src_surf.width_stride, src_surf.height}, nvcv::FMT_BGR8);
        nvcv::TensorDataStridedCuda::Buffer in_buf;
        std::copy(in_reqs.strides, in_reqs.strides + NVCV_TENSOR_MAX_RANK, in_buf.strides);
        in_buf.basePtr = reinterpret_cast<NVCVByte *>(src->data_ptr);

        nvcv::TensorDataStridedCuda in_data(nvcv::TensorShape{in_reqs.shape, in_reqs.rank, in_reqs.layout}, nvcv::DataType{in_reqs.dtype}, in_buf);
        nvcv::TensorWrapData in_tensor(in_data);

        
        TransformRect rect = transform_params->src_rect[0];
        rect.left = rect.left >= src_surf.width ? 0 : rect.left;
        rect.top = rect.top >= src_surf.height ? 0 : rect.top;
        rect.width = rect.width <= 0 ? (src_surf.width - rect.left) : rect.width;
        rect.height = rect.height <= 0 ? (src_surf.height - rect.top) : rect.height;
        NVCVRectI crop_src_rect = { rect.left, rect.top, rect.left + rect.width - 1, rect.top + rect.height - 1 };

        TransformRect dst_rect = transform_params->dst_rect[0];
        dst_rect.left = dst_rect.left >= src_surf.width ? 0 : dst_rect.left;
        dst_rect.top = dst_rect.top >= src_surf.height ? 0 : dst_rect.top;
        dst_rect.width = dst_rect.width <= 0 ? (src_surf.width - dst_rect.left) : dst_rect.width;
        dst_rect.height = dst_rect.height <= 0 ? (src_surf.height - dst_rect.top) : dst_rect.height;
        NVCVRectI paste_dst_rect(dst_rect.left, dst_rect.top, dst_rect.left + dst_rect.width - 1, dst_rect.top + dst_rect.height - 1);
            
        if (!crop_tensor_){
            crop_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, {rect.width, rect.height}, nvcv::FMT_BGR8));
        }

        (*crop_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), in_tensor, *crop_tensor_, crop_src_rect);
        
        if (!resized_tensor_){
            resized_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, {  dst_rect.width,  dst_rect.height }, nvcv::FMT_BGR8));
        }

        (*resize_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), *crop_tensor_, *resized_tensor_, NVCV_INTERP_LINEAR);


        if (!copymakeborder_tensor_){
            copymakeborder_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, { dst_surf.width, dst_surf.height }, nvcv::FMT_BGR8));
        }

       (*copymakeborder_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), *resized_tensor_, *copymakeborder_tensor_, dst_rect.top, dst_rect.left, NVCV_BORDER_CONSTANT, 0);

        auto out_data = copymakeborder_tensor_->exportData<nvcv::TensorDataStridedCuda>();

        cudaMemcpyAsync(dst_surf.data_ptr, (const unsigned char *)out_data->basePtr(), dst_surf.data_size, cudaMemcpyDeviceToHost);
        cuStreamSynchronize(cu_stream_);

        crop_tensor_.reset();
        resized_tensor_.reset();
        copymakeborder_tensor_.reset();

        return 0;
    }

    int TransformerNV::NVConvertFormat(BufSurface *src, BufSurface *dst, TransformParams *transform_params) {

        auto& src_surf = src->surface_list[0];

        nvcv::Tensor::Requirements in_reqs = nvcv::Tensor::CalcRequirements(1, {src_surf.width_stride, src_surf.height}, nvcv::FMT_NV12);
        nvcv::TensorDataStridedCuda::Buffer in_buf;
        std::copy(in_reqs.strides, in_reqs.strides + NVCV_TENSOR_MAX_RANK, in_buf.strides);
        in_buf.basePtr = reinterpret_cast<NVCVByte *>(src->data_ptr);

        nvcv::TensorDataStridedCuda in_data(nvcv::TensorShape{in_reqs.shape, in_reqs.rank, in_reqs.layout}, nvcv::DataType{in_reqs.dtype}, in_buf);
        nvcv::TensorWrapData in_tensor(in_data);
    
        if (!cvtcolor_tensor_){
            cvtcolor_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, {src_surf.width_stride, src_surf.height}, nvcv::FMT_BGR8));
        }

        (*cvtcolor_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), in_tensor, *cvtcolor_tensor_, NVCV_COLOR_YUV2BGR_NV12);

        auto& dst_surf = dst->surface_list[0];
        auto out_data = resized_tensor_->exportData<nvcv::TensorDataStridedCuda>();

        cudaMemcpyAsync(dst_surf.data_ptr, (const unsigned char *)out_data->basePtr(), dst_surf.data_size, cudaMemcpyDeviceToHost);
        cuStreamSynchronize(cu_stream_);

        cvtcolor_tensor_.reset();
        return 0;
    }


}  // namespace 

4 算法推理相关代码修改

4.1 ./infer_server/src/model/model.h

/*************************************************************************
 * Copyright (C) [2020] by Cambricon, Inc. All rights reserved
 *
 *  Licensed under the Apache License, Version 2.0 (the "License");
 *  you may not use this file except in compliance with the License.
 *  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
 * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
 * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 *************************************************************************/

#ifndef INFER_SERVER_MODEL_H_
#define INFER_SERVER_MODEL_H_

#include <glog/logging.h>
#include <algorithm>
#include <map>
#include <memory>
#include <mutex>
#include <string>
#include <unordered_map>
#include <vector>

#include "nvis/infer_server.h"
#include "nvis/processor.h"
#include "nvis/shape.h"

namespace trt {
#include "trteng_exp/export_funtions.h"
}

namespace infer_server {
    using TEngine = std::unique_ptr<trt::trtnet_t>;

    class Model;
    class ModelRunner {
    public:
        explicit ModelRunner(int device_id) : device_id_(device_id) {}
        ModelRunner(const ModelRunner& other) = delete;
        ModelRunner& operator=(const ModelRunner& other) = delete;
        ModelRunner(ModelRunner&& other) = default;
        ModelRunner& operator=(ModelRunner&& other) = default;
        ~ModelRunner() = default;

        bool Init(TEngine engine) noexcept;
        Status Run(ModelIO* input, ModelIO* output) noexcept;  // NOLINT

        void SetInputNum(uint32_t  i_num) noexcept{ input_num_ = i_num; }
        void SetOutputNum(uint32_t  o_num) noexcept{ output_num_ = o_num; }

        void SetInputShapes(std::vector<Shape> const& i_shapes) noexcept{ i_shapes_ = i_shapes; }
        void SetOutputShapes(std::vector<Shape> const& o_shapes) noexcept{ o_shapes_ = o_shapes; }

        void SetInputLayouts(std::vector<DataLayout> const& i_layouts) noexcept{ i_layouts_ = i_layouts; }
        void SetOutputLayouts(std::vector<DataLayout> const& o_layouts) noexcept{ o_layouts_ = o_layouts; }

    private:
        TEngine engine_{};

        uint32_t input_num_{};
        uint32_t output_num_{};

        std::vector<Shape> i_shapes_;
        std::vector<Shape> o_shapes_;
        std::vector<DataLayout> i_layouts_;
        std::vector<DataLayout> o_layouts_;

        int device_id_{};
    };  // class RuntimeContext

    class Model : public ModelInfo {
    public:
        Model() = default;
        bool Init(const std::string& model_path) noexcept;
        ~Model();

        bool HasInit() const noexcept{ return has_init_; }

            const Shape& InputShape(int index) const noexcept override{
            CHECK(index < i_num_ || index >= 0) << "[InferServer] [Model] Input shape index overflow";
            return input_shapes_[index];
        }
            const Shape& OutputShape(int index) const noexcept override{
            CHECK(index < o_num_ || index >= 0) << "[InferServer] [Model] Output shape index overflow";
            return output_shapes_[index];
        }
            const DataLayout& InputLayout(int index) const noexcept override{
            CHECK(index < i_num_ || index >= 0) << "[InferServer] [Model] Input shape index overflow";
            return i_layouts_[index];
        }
            const DataLayout& OutputLayout(int index) const noexcept override{
            CHECK(index < o_num_ || index >= 0) << "[InferServer] [Model] Input shape index overflow";
            return o_layouts_[index];
        }
        uint32_t InputNum() const noexcept override{ return i_num_; }
        uint32_t OutputNum() const noexcept override{ return o_num_; }
        uint32_t BatchSize() const noexcept override{ return model_batch_size_; }

        bool FixedOutputShape() noexcept override{ return FixedShape(output_shapes_); }

        std::shared_ptr<ModelRunner> GetRunner(int device_id) noexcept{
            trt::ErrInfo ei{};
            trt::trtnet_t* net = trt::load_net_from_file(model_file_.data(), &ei);
            CHECK(!net) << "trt::load_net_from_file failed: model file: " << te2fullpath << ", error: " << ei.errmsg;
            TEngine engine = TEngine(net, [](trt::trtnet_t* n) { trt::release_net(n); });

            auto runner = std::make_shared<ModelRunner>(device_id);
            runner->SetInputNum(i_num_);
            runner->SetOutputNum(o_num_);
            runner->SetInputShapes(input_shapes_);
            runner->SetOutputShapes(output_shapes_);
            runner->SetInputLayouts(i_layouts_);
            runner->SetOutputLayouts(o_layouts_);     

            if (!runner->Init(std::move(engine))) return nullptr;
            return runner;
        }

        std::string GetKey() const noexcept override{ return model_file_; }

    private:
        bool GetModelInfo(trt::trtnet_t* net) noexcept;
        bool FixedShape(const std::vector<Shape>& shapes) noexcept{
            for (auto &shape : shapes) {
                auto vectorized_shape = shape.Vectorize();
                if (!std::all_of(vectorized_shape.begin(), vectorized_shape.end(), [](int64_t v) { return v > 0; })) {
                    return false;
                }
            }
            return !shapes.empty();
        }

        Model(const Model&) = delete;
        Model& operator=(const Model&) = delete;

    private:
        std::string model_file_;

        std::vector<DataLayout> i_layouts_, o_layouts_;
        std::vector<Shape> input_shapes_, output_shapes_;
        int i_num_{}, o_num_{};
        uint32_t model_batch_size_{ 1 };
        bool has_init_{ false };
    };  // class Model

    // use environment CNIS_MODEL_CACHE_LIMIT to control cache limit
    class ModelManager {
    public:
        static ModelManager* Instance() noexcept;

        void SetModelDir(const std::string& model_dir) noexcept{ model_dir_ = model_dir; }

        ModelPtr Load(const std::string& model_file) noexcept;
        ModelPtr Load(void* mem_ptr, size_t size) noexcept;
        bool Unload(ModelPtr model) noexcept;

        void ClearCache() noexcept;

        int CacheSize() noexcept;

        std::shared_ptr<Model> GetModel(const std::string& name) noexcept;

    private:
        std::string DownloadModel(const std::string& url) noexcept;
        void CheckAndCleanCache() noexcept;

        std::string model_dir_{ "." };

        static std::unordered_map<std::string, std::shared_ptr<Model>> model_cache_;
        static std::mutex model_cache_mutex_;
    };  // class ModelManager

}  // namespace infer_server

#endif  // INFER_SERVER_MODEL_H_

4.2 ./infer_server/src/model/model.cpp

/*************************************************************************
 * Copyright (C) [2020] by Cambricon, Inc. All rights reserved
 *
 *  Licensed under the Apache License, Version 2.0 (the "License");
 *  you may not use this file except in compliance with the License.
 *  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
 * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
 * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 *************************************************************************/

#include "model.h"

#include <glog/logging.h>
#include <algorithm>
#include <memory>
#include <string>
#include <utility>
#include <vector>

#include "core/data_type.h"
#include "common/utils.hpp"

using std::string;
using std::vector;

namespace infer_server {

    bool ModelRunner::Init(TEngine engine) noexcept{
        if (engine == nullptr) return false;
        engine_ = std::move(engine);
        return true;
    }

        Status ModelRunner::Run(ModelIO* in, ModelIO* out) noexcept{  // NOLINT
        auto& input = in->surfs;
        auto& output = out->surfs;
        CHECK_EQ(input_num_, input.size()) << "[InferServer] [ModelRunner] Input number is mismatched";

        VLOG(5) << "[InferServer] [ModelRunner] Process inference once, input num: " << input_num_ << " output num: "
            << output_num_;

        std::vector<trt::NetInoutLayerData> inputData(input_num_);
        std::vector<trt::NetInoutLayerData> outputData(output_num_);

        for (uint32_t i_idx = 0; i_idx < input_num_; ++i_idx) {
            CHECK_EQ(input_num_, input.size()) << "[InferServer] [ModelRunner] Input number is mismatched";

            inputData[i].data = input[i_idx]->GetData(0);
            inputData[i].size = input[i_idx]->GetSize(0);
            inputData[i].layer_idx = i_idx;
        }

        for (uint32_t o_idx = 0; o_idx < output_num_; ++o_idx) {
            CHECK_EQ(output_num_, output.size()) << "[InferServer] [ModelRunner] Output number is mismatched";

            outputData[i].data = input[o_idx]->GetData(0);
            outputData[i].size = input[o_idx]->GetSize(0);
            outputData[i].layer_idx = o_idx;
        }

        uint32_t batchsize = input[0]->GetNumFilled();
        trt::ErrInfo ei{};
        _SAFECALL(trt::net_do_inference(engine_.get(), batchsize, inputData.data(), inputData.size(), outputData.data(), outputData.size(), &ei)
                  , "[InferServer] [ModelRunner] Infer failed.", Status::ERROR_BACKEND);

        return Status::SUCCESS;
    }

        bool Model::Init(const string& model_file) noexcept{
        model_file_ = model_file;

        trt::ErrInfo ei{};
        trt::trtnet_t* net = trt::load_net_from_file(model_file_.data(), &ei);
        CHECK(!net) << "trt::load_net_from_file failed: model file: " << te2fullpath << ", error: " << ei.errmsg;

        has_init_ = GetModelInfo(model);

        VLOG(1) << "[InferServer] [Model] (success) Load model from file: " << model_file_;

        trt::release_net(net);

        return has_init_;
    }

        bool Model::GetModelInfo(trt::trtnet_t* net) noexcept{
        VLOG(1) << "[InferServer] [Model] (success) Load model from graph file: " << model_file_;

        int model_batch_size_ = trt::net_max_batch_size(net);

        // get IO messages
        // get io number and data size
        i_num_ = trt::net_num_inputs(net);
        o_num_ = trt::net_num_outputs(net);

        // get input info
        for (int i = 0; i < ninp; i++) {
            trt::LayerDims ldim{};
            trt::net_input_layer_dims(net, i, &ldim);
            input_shapes_.emplace_back(std::move(Shape({ std::max(ldim.n, model_batch_size_), ldim.c, ldim.h, ldim.w })));

            DataLayout layout;
            layout.dtype = DataType::FLOAT;
            layout.order = DimOrder::NCHW;
            i_layouts_.emplace_back(std::move(layout));
        }

        // get output info
        int noup = net_num_outputs(net);
        runner->SetOutputNum(noup);
        for (int i = 0; i < noup; i++) {
            trt::LayerDims ldim{};
            trt::net_output_layer_dims(net, i, &ldim);
            output_shapes_.emplace_back(std::move(Shape({ std::max(ldim.n, model_batch_size_), ldim.c, ldim.h, ldim.w })));

            DataLayout layout;
            layout.dtype = DataType::FLOAT;
            layout.order = DimOrder::NCHW;
            o_layouts_.emplace_back(std::move(layout));
        }

        VLOG(1) << "[InferServer] [Model] Model Info: input number = " << i_num_ << ";\toutput number = " << o_num_;
        VLOG(1) << "[InferServer] [Model]             batch size = " << model_batch_size_;
        for (int i = 0; i < i_num_; ++i) {
            VLOG(1) << "[InferServer] [Model] ----- input index [" << i;
            VLOG(1) << "[InferServer] [Model]       data type " << detail::DataTypeStr(i_layouts_[i].dtype);
            VLOG(1) << "[InferServer] [Model]       dim order " << detail::DimOrderStr(i_layouts_[i].order);
            VLOG(1) << "[InferServer] [Model]       shape " << input_shapes_[i];
        }
        for (int i = 0; i < o_num_; ++i) {
            VLOG(1) << "[InferServer] [Model] ----- output index [" << i;
            VLOG(1) << "[InferServer] [Model]       data type " << detail::DataTypeStr(o_layouts_[i].dtype);
            VLOG(1) << "[InferServer] [Model]       dim order " << detail::DimOrderStr(o_layouts_[i].order);
            VLOG(1) << "[InferServer] [Model]       shape " << output_shapes_[i];
        }
        return true;
    }

        Model::~Model() {
        VLOG(1) << "[InferServer] [Model] Unload model: " << model_file_;
    }
}  // namespace infer_server

5 其他代码修改

其他的还有很多零碎代码,比如一些命名空间的名字,还有一些其他名称,还有很多文件里面调用的一些函数的名字,太乱了,不写到博客里面了。

参考文献:

在NVIDIA Jetson AGX Orin中使用jetson-ffmpeg调用硬件编解码加速处理-CSDN博客

NVIDIA Jetson AGX Orin源码编译安装CV-CUDA-CSDN博客

GitHub - Cambricon/CNStream: CNStream is a streaming framework for building Cambricon machine learning pipelines http://forum.cambricon.com https://gitee.com/SolutionSDK/CNStream

easydk/samples/simple_demo/common/video_decoder.cpp at master · Cambricon/easydk · GitHub

aclStream流处理多路并发Pipeline框架中 视频解码 代码调用流程整理、类的层次关系整理、回调函数赋值和调用流程整理-CSDN博客

aclStream流处理多路并发Pipeline框架中VEncode Module代码调用流程整理、类的层次关系整理、回调函数赋值和调用流程整理-CSDN博客

FFmpeg/doc/examples at master · FFmpeg/FFmpeg · GitHub

GitHub - CVCUDA/CV-CUDA: CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.

如何使用FFmpeg的解码器—FFmpeg API教程 · FFmpeg原理

C++ API — CV-CUDA Beta documentation (cvcuda.github.io)

CV-CUDA/tests/cvcuda/system at main · CVCUDA/CV-CUDA · GitHub

Resize — CV-CUDA Beta documentation

CUDA Runtime API :: CUDA Toolkit Documentation

CUDA Toolkit Documentation 12.6 Update 1

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2103810.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

Kafka【五】Buffer Cache (缓冲区缓存)、Page Cache (页缓存)和零拷贝技术

【1】Buffer Cache (缓冲区缓存) 在Linux操作系统中&#xff0c;Buffer Cache&#xff08;缓冲区缓存&#xff09;是内核用来优化对块设备&#xff08;如磁盘&#xff09;读写操作的一种机制&#xff08;故而有一种说法叫做块缓存&#xff09;。尽管在较新的Linux内核版本中&a…

Spring Cloud全解析:熔断之Hystrix服务监控

Hystrix服务监控 Hystrix除了熔断降级之外&#xff0c;还提供了准实时的调用监控&#xff0c;持续的记录所有通过Hystrix发起的请求的执行信息&#xff0c;并以统计报表的形式展示出来&#xff0c;包括有每秒执行多少请求&#xff0c;多少成功&#xff0c;多少失败等&#xff…

【C++】vector类:模拟实现(适合新手手撕vector)

在实现本文的vector模拟前&#xff0c;建议先了解关于vector的必要知识&#xff1a;【C】容器vector常用接口详解-CSDN博客https://blog.csdn.net/2301_80555259/article/details/141529230?spm1001.2014.3001.5501 目录 一.基本结构 二.构造函数&#xff08;constructor&…

【算法】位运算

【ps】本篇有 10 道 leetcode OJ。 目录 一、算法简介 二、相关例题 1&#xff09;位1的个数 .1- 题目解析 .2- 代码编写 2&#xff09;比特位计数 .1- 题目解析 .2- 代码编写 3&#xff09;汉明距离 .1- 题目解析 .2- 代码编写 4&#xff09;只出现一次的数字 .…

3000字带你了解SD提示词用法,一点就通,小白轻松上手(附提示词生成器)(1.4 SD提示词运用)

提示词是什么 提示词是我们向AI模型发出的指令。正确的提示词能让AI准确反馈所需的输出&#xff0c;而优质的提示词则能使AI生成的内容更优质、更符合你的期望。这与编写程序代码颇为相似&#xff0c;准确的代码逻辑是程序正常运行的前提&#xff0c;而优秀的代码则能减少运行…

知识付费小程序源码轻松实现一站式运营,开启知识变现之旅

技术栈&#xff1a; 以下是一个简单的知识付费小程序的示例代码&#xff1a; app.js&#xff1a;小程序的入口文件 App({onLaunch: function () {// 在小程序启动时执行的代码},globalData: {// 存储全局数据userInfo: null // 用户信息} })pages/index/index.js&#xff1…

【学术会议征稿】第四届智能电网与能源互联网国际会议(SGEI 2024)

第四届智能电网与能源互联网国际会议&#xff08;SGEI 2024&#xff09; 2024 4th International Conference on Smart Grid and Energy Internet 为交流近年来国内外在智能电网和能源互联网领域的理论、技术和应用的最新进展&#xff0c;展示最新成果&#xff0c;由沈阳工业…

Visual Studio 2022 下载和安装

文章目录 概述一&#xff0c;下载步骤二&#xff0c;安装过程 概述 Visual Studio 提供 AI 增强功能&#xff0c;例如用于上下文感知代码补全的 IntelliSense 和可利用开源代码中的 AI 模式的 IntelliCode。 集成的 GitHub Copilot 提供 AI 支持的代码补全、聊天辅助、调试建议…

ElasticSearch学习笔记(三)RestClient操作文档、DSL查询文档、搜索结果排序

文章目录 前言5 RestClient操作文档5.4 删除文档5.4 修改文档5.5 批量导入文档 6 DSL查询文档6.1 准备工作6.2 全文检索查询6.3 精准查询6.4 地理坐标查询6.5 复合查询6.5.1 相关性算分6.5.2 布尔查询 7 搜索结果处理7.1 排序7.1.1 普通字段排序7.1.2 地理坐标排序 前言 Elast…

qmt量化交易策略小白学习笔记第59期【qmt编程之期权数据--获取指定期权品种的详细信息--原生Python】

qmt编程之获取期权数据 qmt更加详细的教程方法&#xff0c;会持续慢慢梳理。 也可找寻博主的历史文章&#xff0c;搜索关键词查看解决方案 &#xff01; 基于BS模型计算欧式期权理论价格 基于Black-Scholes-Merton模型&#xff0c;输入期权标的价格、期权行权价、无风险利率…

Mac 安装Hadoop教程(HomeBrew安装)

1. 引言 本教程旨在介绍在Mac 电脑上安装Hadoop&#xff0c;便于编程开发人员对大数据技术的熟悉和掌握。 2.前提条件 2.1 安装JDK 想要在你的Mac电脑上安装Hadoop&#xff0c;你必须首先安装JDK。具体安装步骤这里就不详细描述了。你可参考Mac 安装JDK8。 2.2 配置ssh环境…

从腰子的营养成分来分析腰子能否“壮阳”,健康地吃腰子。

文章目录 引言I 腰子的营养优点缺点吃腰子无“壮阳”效果II 健康地吃腰子食用前充分清洗浸泡高尿酸及痛风群体慎吃适量吃引言 很多人认为动物内脏有着“以形补形”的好处,如吃动物腰子,能补肾、壮阳,这让很多人对“腰子”非常热爱。 腰子的营养到底如何?经常吃腰子对身体…

优思学院:FMEA与FTA故障树方法对比:工程师必须知道的关键点!

故障树分析&#xff08;Fault Tree Analysis, FTA&#xff09;以某个不希望发生的产品故障事件或严重的系统风险&#xff08;即顶事件&#xff09;为分析对象&#xff0c;通过自上而下的分层次因果逻辑分析&#xff0c;逐步找出导致故障事件的必要且充分的直接原因&#xff0c;…

日程安排组件DHTMLX Scheduler v7.1 - 支持RFC-5545格式

DHTMLX Scheduler是一个类似于Google日历的JavaScript日程安排控件&#xff0c;日历事件通过Ajax动态加载&#xff0c;支持通过拖放功能调整事件日期和时间&#xff0c;事件可以按天、周、月三个种视图显示。 此版本包括几个备受期待的特性&#xff0c;可以帮助用户增强DHTMLX…

基于php+vue+uniapp的医院预约挂号系统小程序

开发语言&#xff1a;PHP框架&#xff1a;phpuniapp数据库&#xff1a;mysql 5.7&#xff08;一定要5.7版本&#xff09;数据库工具&#xff1a;Navicat11开发软件&#xff1a;PhpStorm 系统展示 后台登录界面 管理员功能界面 用户管理 医生管理 科室分类管理 医生信息管理 预…

30s到0.8s,记录一次接口优化成功案例!

大家好&#xff0c;我是沐子&#xff0c;推荐一个程序员免费学习的编程网站 我爱编程网&#xff08;www.love-coding.com&#xff09; ** 场景** 在高并发的数据处理场景中&#xff0c;接口响应时间的优化显得尤为重要。本文将分享一个真实案例&#xff0c;其中一个数据量达到…

(5) 归并排序

归并排序 归并排序是一种分治策略的排序算法。它是一种比较特殊的排序算法&#xff0c;通过递归地先使每个子序列有序&#xff0c;再将两个有序的序列进行合并成一个有序的序列。 归并排序首先由著名的现代计算机之父 John_von_Neumann 在 1945 年发明&#xff0c;被用在了 E…

【Python】Python 读取Excel、DataFrame对比并选出差异数据,重新写入Excel

背景&#xff1a;我在2个系统下载出了两个Excel&#xff0c;现在通过对下载的2个Excel数据&#xff0c;并选出差异数据 从新写入一个新的Excel中 differences_url rC:\Users\LENOVO\Downloads\differences.xlsx; //要生成的差异Excel的位置及名称 df1_url rC:\Users\LENOVO\Dow…

终于知道如何简化时间序列的特征工程了!

在处理时间序列数据时&#xff0c;时间特征往往是最基础且独特的要素&#xff0c;我们的目标通常是预测某种未来的响应或结果。 不过在很多情况下&#xff0c;除了时间特征之外&#xff0c;我们还能获取到一系列其他相关的特征或变量。 时间序列数据中的特征工程涉及从原始时…

进程、线程、时间片

1、操作系统中的程序&#xff08;如微信&#xff09;在运行时&#xff0c;系统会产生一个或多个进程&#xff0c;往往是一个 2、进程内可以包含多个线程&#xff0c;有一个主线程&#xff0c;主线程结束时&#xff0c;进程结束&#xff0c;进而程序结束 3、线程是cpu调度执行…