目录
1 视频解码代码编写----利用jetson-ffmpeg
1.1 nvstream中视频解码的代码流程框架
1.1.1 类的层次关系
1.1.2 各个类的初始化函数调用层次关系
1.1.3 各个类的process函数调用层次关系
1.2 编写视频解码代码
1.2.1 修改VideoInfo结构体定义
1.2.2 修改解封装代码
1.2.3 decode_impl_nv.hpp
1.2.4 decode_impl_nv.cpp
2 硬件相关的图像格式、内存申请接口、内存释放、内存释放等代码修改
2.1 infer_server/include/common/utils.hpp文件内容修改
2.2 cuda的四种内存
2.3 infer_server/src/core/device.cpp修改
3 图像缩放、裁剪、色域转换等代码编写----利用CV-CUDA
3.1 nvstream中图像处理代码流程框架
3.1.1 类的层次关系
3.1.2 各个类的初始化函数调用层次关系
3.1.3 各个类的transform函数调用层次关系
3.2 编写图像缩放、裁剪、色域转换等代码
3.2.1 infer_server/src/nv/transform_impl_nv.hpp
3.2.2 infer_server/src/nv/transform_impl_nv.cpp
4 算法推理相关代码修改
4.1 ./infer_server/src/model/model.h
4.2 ./infer_server/src/model/model.cpp
5 其他代码修改
参考文献:
记录下将CNStream流处理多路并发Pipeline框架适配到NVIDIA Jetson AGX Orin的过程,以及过程中遇到的问题,我的jetson盒子是用jetpack5.1.3重新刷机之后的,这是系列博客的第二篇。
另外下面的代码只是初步代码,未编译,未调试。
1 视频解码代码编写----利用jetson-ffmpeg
用jetson-ffmpeg来做视频解码,本质上还是用ffmpeg解码,是不是jetson-ffmpeg底层调用了英伟达的nvmpi做硬件加速。
1.1 nvstream中视频解码的代码流程框架
视频解码的代码流程框架在博客:aclStream流处理多路并发Pipeline框架中 视频解码 代码调用流程整理、类的层次关系整理、回调函数赋值和调用流程整理-CSDN博客
上面博客里面的是详细代码阅读,从中提炼出来简单点的框架
1.1.1 类的层次关系
class FileHandler类里面有个FileHandlerImpl *impl_ = nullptr成员
-->class FileHandlerImpl
-->class FFParser 解封装的类
-->class DeviceDecoder
-->class DecodeService
-->class IDecoder 只是个虚基类,被用来继承的
-->IDecoder *decoder_ = CreateDecoder();这里面就是new DecoderAcl()了。 DecoderAcl继承IDecoder
-->再往下就是各种的硬件相关的类了,
上面是类的关系,这次英伟达平台上就是从IDecoder *decoder_ = CreateDecoder();开始写一个新的类然后用来做具体的解码,上层的那些类还是保持不变的。
1.1.2 各个类的初始化函数调用层次关系
FileHandlerImpl::Open()
std::thread(&FileHandlerImpl::Loop
PrepareResources()
parser_.Open
impl_->Open(url, result, only_key_frame);
result_->OnParserInfo(info);
decoder_->Create(info, &extra)这个decoder_就是DeviceDecoder类
VdecCreate(void **vdec, VdecCreateParams *params) 这是个单纯的函数,不在任何类里面,函数内容在下一行
infer_server::DecodeService::Instance().Create(vdec, params);
DecodeService::Create里面有下面两行
IDecoder *decoder_ = CreateDecoder()先创建DecoderAcl
decoder_->Create(params)然后调用DecoderAcl 的create函数,
再往下就是硬件相关的各种类的init和open函数了,这次英伟达的也是要创建个新的类,然后在这里新类的
open或者init函数被调用
1.1.3 各个类的process函数调用层次关系
process分两个方向,一个是从上到下送解码数据的,另一个是解码完之后获取解码数据,然后从下往上回传的。先看从上到下送frame的流程
bool FileHandlerImpl::Process() {
parser_.Parse();
impl_->Parse();
result_->OnParserFrame(&frame);
DeviceDecoder::Process(VideoEsPacket *pkt) {
int VdecSendStream(void *vdec, const VdecStream *stream, int timeout_ms)
{
return infer_server::DecodeService::Instance().SendStream(vdec, stream, timeout_ms);
}
decoder_->SendStream(stream, timeout_ms);这就已经是DecoderAcl的sendstream了,已经到硬件了。
vdec_->Decode(data_ptr, data_size, frame_id, this);vdec_是AclLiteVideoProc
再往下就不看了,下面几层类都是硬件解码的。
然后看一下从下往上回传的,就是解码之后的数据一层层传到上层的类。
VideoDecoder::DvppVdecCallbackV2(hi_video_frame_info *frame, void *userdata) {
CallBackVdec(const std::shared_ptr<acllite::ImageData> decoded_image, uint32_t channel_id, uint32_t frame_id, v
decoder->OnFrame(decoded_image, channel_id, frame_id);这个decoder就是DecoderAcl类,
create_params_.OnFrame(surf, create_params_.userdata);
OnFrame_(BufSurface *surf, void *userdata)class DeviceDecoder类
result_->OnDecodeFrame(wrapper);
FileHandlerImpl::OnDecodeFrame
1.2 编写视频解码代码
从上面分析可以知道,需要写一个类,替换掉之前的底层硬件解码类,在jetson上,用jetson-ffmpeg做解码,jetson-ffmpeg会调用硬件加速。
1.2.1 修改VideoInfo结构体定义
首先修改一个结构体的定义,因为之前的cnstream中解码不是用ffmpeg解码的,他只是用ffmpeg解封装,所以这个结构体定义是这样的modules/source/src/video_parser.hpp,他由于不用ffmpeg解码所以不需要 AVCodecParameters* codecpar = nullptr成员。
namespace cnstream {
struct VideoInfo {
AVCodecID codec_id;
#ifdef HAVE_FFMPEG_AVDEVICE // for usb camera
int format;
int width;
int height;
#endif
int progressive;
std::vector<unsigned char> extra_data;
};
...其他代码...
然后在解码的demo那里,easydk/samples/simple_demo/common/video_parser.h,还有一个结构体的定义。
struct VideoInfo {
AVCodecID codec_id = AV_CODEC_ID_NONE;
#if LIBAVFORMAT_VERSION_INT >= FFMPEG_VERSION_3_1
AVCodecParameters* codecpar = nullptr;
#endif
AVCodecContext* codec_ctx = nullptr;
std::vector<uint8_t> extra_data{};
int width = 0;
int height = 0;
int progressive = 0;
};
所以我这里要修改一下这个结构体的定义,把第一个的结构体定义改成下面的格式。
namespace cnstream {
struct VideoInfo {
AVCodecID codec_id;
#if LIBAVFORMAT_VERSION_INT >= FFMPEG_VERSION_3_1
AVCodecParameters* codecpar = nullptr;
#endif
AVCodecContext* codec_ctx = nullptr;
std::vector<unsigned char> extra_data;
int width;
int height;
int progressive;
#ifdef HAVE_FFMPEG_AVDEVICE // for usb camera
int format;
#endif
};
1.2.2 修改解封装代码
然后解封装那里的代码,需要修改一下给AVCodecParameters* codecpar成员赋值。参考CNStream中easydk/samples/simple_demo/common/video_parser.cpp的代码去修改NVStream中modules/source/src/video_parser.cpp的代码,
int Open(const std::string& url, IParserResult* result, bool only_key_frame = false) {
std::unique_lock<std::mutex> guard(mutex_);
if (!result) return -1;
result_ = result;
// format context
fmt_ctx_ = avformat_alloc_context();
if (!fmt_ctx_) {
return -1;
}
url_name_ = url;
AVInputFormat* ifmt = NULL;
// for usb camera
#ifdef HAVE_FFMPEG_AVDEVICE
const char* usb_prefix = "/dev/video";
if (0 == strncasecmp(url_name_.c_str(), usb_prefix, strlen(usb_prefix))) {
// open v4l2 input
#if defined(__linux) || defined(__unix)
ifmt = av_find_input_format("video4linux2");
if (!ifmt) {
LOGE(SOURCE) << "[" << stream_id_ << "]: Could not find v4l2 format.";
return false;
}
#elif defined(_WIN32) || defined(_WIN64)
ifmt = av_find_input_format("dshow");
if (!ifmt) {
LOGE(SOURCE) << "[" << stream_id_ << "]: Could not find dshow.";
return false;
}
#else
LOGE(SOURCE) << "[" << stream_id_ << "]: Unsupported Platform";
return false;
#endif
}
#endif
int ret_code;
const char* p_rtsp_start_str = "rtsp://";
if (0 == strncasecmp(url_name_.c_str(), p_rtsp_start_str, strlen(p_rtsp_start_str))) {
AVIOInterruptCB intrpt_callback = { InterruptCallBack, this };
fmt_ctx_->interrupt_callback = intrpt_callback;
last_receive_frame_time_ = GetTickCount();
// options
av_dict_set(&options_, "buffer_size", "1024000", 0);
av_dict_set(&options_, "max_delay", "500000", 0);
av_dict_set(&options_, "stimeout", "20000000", 0);
av_dict_set(&options_, "rtsp_flags", "prefer_tcp", 0);
rtsp_source_ = true;
}
else {
// options
av_dict_set(&options_, "buffer_size", "1024000", 0);
av_dict_set(&options_, "max_delay", "500000", 0);
}
// open input
ret_code = avformat_open_input(&fmt_ctx_, url_name_.c_str(), ifmt, &options_);
if (0 != ret_code) {
LOGI(SOURCE) << "[" << stream_id_ << "]: Couldn't open input stream -- " << url_name_;
return -1;
}
// find video stream information
ret_code = avformat_find_stream_info(fmt_ctx_, NULL);
if (ret_code < 0) {
LOGI(SOURCE) << "[" << stream_id_ << "]: Couldn't find stream information -- " << url_name_;
return -1;
}
// fill info
VideoInfo video_info;
VideoInfo* info = &video_info;
int video_index = -1;
AVStream* st = nullptr;
for (uint32_t loop_i = 0; loop_i < fmt_ctx_->nb_streams; loop_i++) {
st = fmt_ctx_->streams[loop_i];
#if LIBAVFORMAT_VERSION_INT >= FFMPEG_VERSION_3_1
if (st->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
#else
if (st->codec->codec_type == AVMEDIA_TYPE_VIDEO) {
#endif
video_index = loop_i;
break;
}
}
if (video_index == -1) {
LOGI(SOURCE) << "[" << stream_id_ << "]: Couldn't find a video stream -- " << url_name_;
return -1;
}
video_index_ = video_index;
info->width = st->codecpar->width;
info->height = st->codecpar->height;
#if LIBAVFORMAT_VERSION_INT >= FFMPEG_VERSION_3_1
info->codec_id = st->codecpar->codec_id;
int field_order = st->codecpar->field_order;
#ifdef HAVE_FFMPEG_AVDEVICE // for usb camera
info->format = st->codecpar->format;
#endif
#else
info->codec_id = st->codec->codec_id;
int field_order = st->codec->field_order;
#ifdef HAVE_FFMPEG_AVDEVICE // for usb camera
info->format = st->codec->format;
#endif
#endif
#if LIBAVFORMAT_VERSION_INT >= FFMPEG_VERSION_3_1
info->codecpar = fmt_ctx_->streams[video_index_]->codecpar;
#endif
info->codec_ctx = fmt_ctx_->streams[video_index_]->codec;
/*At this moment, if the demuxer does not set this value (avctx->field_order == UNKNOWN),
* the input stream will be assumed as progressive one.
*/
switch (field_order) {
case AV_FIELD_TT:
case AV_FIELD_BB:
case AV_FIELD_TB:
case AV_FIELD_BT:
info->progressive = 0;
break;
case AV_FIELD_PROGRESSIVE: // fall through
default:
info->progressive = 1;
break;
}
#if LIBAVFORMAT_VERSION_INT >= FFMPEG_VERSION_3_1
unsigned char* extradata = st->codecpar->extradata;
int extradata_size = st->codecpar->extradata_size;
#else
unsigned char* extradata = st->codec->extradata;
int extradata_size = st->codec->extradata_size;
#endif
if (extradata && extradata_size) {
info->extra_data.resize(extradata_size);
memcpy(info->extra_data.data(), extradata, extradata_size);
}
// bitstream filter
bsf_ctx_ = nullptr;
const AVBitStreamFilter *pfilter{};
if (strstr(fmt_ctx_->iformat->name, "mp4") || strstr(fmt_ctx_->iformat->name, "flv") ||
strstr(fmt_ctx_->iformat->name, "matroska")) {
if (AV_CODEC_ID_H264 == info->codec_id) {
pfilter = av_bsf_get_by_name("h264_mp4toannexb");
}
else if (AV_CODEC_ID_HEVC == info->codec_id) {
pfilter = av_bsf_get_by_name("hevc_mp4toannexb");
}
else {
pfilter = nullptr;
}
}
if (pfilter == nullptr) {
bsf_ctx_ = nullptr;
}
else {
av_bsf_alloc(pfilter, &bsf_ctx_);
}
if (result_) {
result_->OnParserInfo(info);
}
av_init_packet(&packet_);
first_frame_ = true;
eos_reached_ = false;
open_success_ = true;
only_key_frame_ = only_key_frame;
return 0;
}
1.2.3 decode_impl_nv.hpp
创建个新的类,用来做视频解码,初步代码如下,后面编译以及运行的时候有错误再修改。
#ifndef _DECODE_IMPL_NV_HPP_
#define _DECODE_IMPL_NV_HPP_
#include <atomic>
#include <string>
#include <libavformat/avformat.h>
#include <libavcodec/avcodec.h>
#include "../decode_impl.hpp"
namespace infer_server {
class DecodeFFmpeg : public IDecoder {
public:
DecodeFFmpeg() = default;
~DecodeFFmpeg() = default;
int Create(VdecCreateParams *params) override;
int Destroy() override;
int SendStream(const VdecStream *stream, int timeout_ms) override;
void OnFrame(AVFrame *av_frame_, uint32_t frame_id) override;
void OnEos() override;
void OnError(int errcode) override;
private:
void ResetFlags();
private:
std::atomic<bool> eos_sent_{false}; // flag for acl eos has been sent to decoder
std::atomic<bool> created_{false};
VdecCreateParams create_params_;
AVCodec *av_codec_;
AVCodecContext* codec_context_{nullptr};
AVFrame *av_frame_ = nullptr;
void *transformer_{};
};
} // namespace
#endif // DECODE_IMPL_NV_HPP
1.2.4 decode_impl_nv.cpp
#include <iostream>
#include "glog/logging.h"
#include "decode_impl_nv.hpp"
#include "defer.hpp"
namespace infer_server {
bool DecodeFFmpeg::Create(VdecCreateParams *params) {
create_params_ = *params;
switch (params->type) {
case VDEC_TYPE_H264:
av_codec_ = avcodec_find_decoder(AV_CODEC_ID_H264);
break;
case VDEC_TYPE_H265:
av_codec_ = avcodec_find_decoder(AV_CODEC_ID_H265);
break;
case VDEC_TYPE_JPEG:
default:
LOG(ERROR) << "[InferServer] [DecodeFFmpeg] Create(): Unsupported codec type: " << create_params_.type;
return -1;
}
if (!av_codec_) {
LOG(ERROR) << "[InferServer] [DecodeFFmpeg] avcodec_find_decoder failed";
return false;
}
codec_context_ = avcodec_alloc_context3(av_codec_);
if (!codec_context_) {
LOG(ERROR) << "[InferServer] [DecodeFFmpeg] Failed to do avcodec_alloc_context3";
return false;
}
AVDictionary *decoder_opts = nullptr;
defer([&decoder_opts] {
if (decoder_opts) av_dict_free(&decoder_opts);
});
av_dict_set_int(&decoder_opts, "device_id", 0, 0);
if (avcodec_open2(codec_context_, av_codec_, &decoder_opts) < 0) {
LOG(ERROR) << "[InferServer] [DecodeFFmpeg] Failed to open codec";
return false;
}
av_frame_ = av_frame_alloc();
if (!av_frame_) {
LOG(ERROR) << "[InferServer] [DecodeFFmpeg] Could not alloc frame";
return false;
}
ResetFlags();
//待修改
//这里还需要增加创建transform用来做图像缩放的代码。
created_ = true;
return true;
}
int DecodeFFmpeg::SendStream(const VdecStream *stream, int timeout_ms) {
if (!created_) {
LOG(ERROR) << "[InferServer] [DecodeFFmpeg] SendStream(): Decoder is not created";
return -1;
}
if (nullptr == stream || nullptr == stream->bits) {
if (eos_sent_) {
LOG(WARNING) << "[InferServer] [DecodeFFmpeg] SendStream(): EOS packet has been send";
return 0;
}
AVPacket framePacket = {};
av_init_packet(&framePacket);
framePacket.data = nullptr;
framePacket.size = 0;
avcodec_send_packet(decode_, &packet);
// flush all frames ...
int ret = 0;
do {
ret = avcodec_receive_frame(decode_, av_frame_);
if(ret >= 0)
{
OnFrame(av_frame_, stream->pts);
}
} while (ret >= 0);
eos_sent_ = true;
OnEos();
}
else{
if (eos_sent_) {
LOG(ERROR) << "[InferServer] [DecodeFFmpeg] SendStream(): EOS has been sent, process packet failed, pts:"
<< stream->pts;
return -1;
}
AVPacket framePacket = {};
av_init_packet(&framePacket);
framePacket.data = stream.bits;
framePacket.size = stream.len;
//开始解码
int ret = avcodec_send_packet(codec_context_, framePacket);
if (ret < 0) {
LOG(ERROR) << "[InferServer] [DecodeFFmpeg] avcodec_send_packet failed, data ptr, size:"
<< framePacket->data << ", " << framePacket->size;
return false;
}
ret = avcodec_receive_frame(codec_context_, av_frame_);
OnFrame(av_frame_, stream->pts);
}
}
void DecodeFFmpeg::OnFrame(AVFrame *av_frame_, uint32_t frame_id) {
BufSurface *surf = nullptr;
if (create_params_.GetBufSurf(&surf, av_frame_->width, av_frame_->height, CastColorFmt(av_frame_->format),
create_params_.surf_timeout_ms, create_params_.userdata) < 0) {
LOG(ERROR) << "[InferServer] [DecoderAcl] OnFrame(): Get BufSurface failed";
OnError(-1);
return;
}
if (surf->mem_type != BUF_MEMORY_DVPP) {
LOG(ERROR) << "[InferServer] [DecoderAcl] OnFrame(): BufSurface memory type must be BUF_MEMORY_DVPP";
return;
}
switch (av_frame_->format) {
case acllite::ImageFormat::YUV_SP_420:
case acllite::ImageFormat::YVU_SP_420:
if (surf->surface_list[0].width != av_frame_->width || surf->surface_list[0].height != av_frame_->height) {
BufSurface transform_src;
BufSurfaceParams src_param;
memset(&transform_src, 0, sizeof(BufSurface));
memset(&src_param, 0, sizeof(BufSurfaceParams));
src_param.color_format = CastColorFmt(av_frame_->format);
src_param.data_size = codec_image->size;//待修改。
src_param.data_ptr = reinterpret_cast<void *>(codec_image->data.get());//待修改。
VLOG(5) << "[InferServer] [DecoderAcl] OnFrame(): codec_frame: "
<< " width = " << av_frame_->width
<< ", height = " << av_frame_->height
<< ", width stride = " << av_frame_->alignWidth
<< ", height stride = " << av_frame_->alignHeight;
VLOG(5) << "[InferServer] [DecoderAcl] OnFrame(): surf->surface_list[0]: "
<< " width = " << surf->surface_list[0].width
<< ", height = " << surf->surface_list[0].height
<< ", width stride = " << surf->surface_list[0].width_stride
<< ", height stride = " << surf->surface_list[0].height_stride;
src_param.width = av_frame_->width;
src_param.height = av_frame_->height;
src_param.width_stride = codec_image->alignWidth;//待修改。
src_param.height_stride = codec_image->alignHeight;//待修改。
transform_src.batch_size = 1;
transform_src.num_filled = 1;
transform_src.device_id = create_params_.device_id;
transform_src.mem_type = BUF_MEMORY_DVPP;
transform_src.surface_list = &src_param;
TransformParams trans_params;
memset(&trans_params, 0, sizeof(trans_params));
trans_params.transform_flag = TRANSFORM_RESIZE_SRC;
if (Transform(transformer_, &transform_src, surf, &trans_params) < 0) {
LOG(ERROR) << "[InferServer] [DecoderAcl] OnFrame(): Transfrom failed";
break;
}
}
else {
std::chrono::high_resolution_clock::time_point tnow = std::chrono::high_resolution_clock::now();
CALL_ACL_FUNC(acllite::CopyDataToHostEx(surf->surface_list[0].data_ptr, codec_image->size, codec_image->data.get(), codec_image->size, codec_image->deviceId)
, "[DecoderAcl] OnFrame(): copy codec buffer data to surf failed");
std::chrono::high_resolution_clock::time_point tpost = std::chrono::high_resolution_clock::now();
//std::cout << "<<<<<<================================ CopyDataToHostEx time = " << std::chrono::duration_cast<std::chrono::duration<double>>(tpost - tnow).count() * 1000 << " ms" << std::endl;
}
break;
default:
break;
}
surf->pts = frame_id;
//std::chrono::high_resolution_clock::time_point tnow = std::chrono::high_resolution_clock::now();
create_params_.OnFrame(surf, create_params_.userdata);
//std::chrono::high_resolution_clock::time_point tpost = std::chrono::high_resolution_clock::now();
//std::cout << "<<<<<<================================ create_params_.OnFrame time = " << std::chrono::duration_cast<std::chrono::duration<double>>(tpost - tnow).count() * 1000 << " ms" << std::endl;
}
void DecodeFFmpeg::Destroy() {
if (!created_) {
LOG(WARNING) << "[InferServer] [DecoderAcl] Destroy(): Decoder is not created";
return 0;
}
// if error happened, destroy directly, eos maybe not be transmitted from the decoder
if (!eos_sent_) {
SendStream(nullptr, 10000);
}
ResetFlags();
if (av_frame_) {
av_frame_free(&av_frame_);
av_frame_ = nullptr;
}
if (codec_context_) {
avcodec_close(codec_context_);
avcodec_free_context(&codec_context_);
codec_context_ = nullptr;
}
//待修改,还需要增加销毁transform的相关代码。
}
DecodeFFmpeg::~DecodeFFmpeg() {
DecodeFFmpeg::Destroy();
}
void DecodeFFmpeg::ResetFlags() {
eos_sent_ = false;
created_ = false;
}
void DecodeFFmpeg::OnEos() {
create_params_.OnEos(create_params_.userdata);
}
void DecodeFFmpeg::OnError(int errcode) {
//convert the error code
create_params_.OnError(static_cast<int>(errcode), create_params_.userdata);
}
} // namespace
2 硬件相关的图像格式、内存申请接口、内存释放、内存释放等代码修改
infer_server/include/common/utils.hpp文件内容如下
/*************************************************************************
* Copyright (C) [2022] by Cambricon, Inc. All rights reserved
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*************************************************************************/
#ifndef COMMON_UTILS_HPP_
#define COMMON_UTILS_HPP_
#include <string>
#include <glog/logging.h>
#include "buf_surface.h"
#include "AclLite/AclLite.h"
#define _SAFECALL(func, expected, msg, ret_val) \
do { \
int _ret = (func); \
if ((expected) != _ret) { \
LOG(ERROR) << "[InferServer] Call [" << #func << "] failed, ret = " << _ret << ". " << msg; \
return (ret_val); \
} \
} while (0)
#define ACL_SAFECALL(func, msg, ret_val) _SAFECALL(func, acllite::ACLLITE_OK, msg, ret_val)
#define _CALLFUNC(func, expected, msg) \
do { \
int _ret = (func); \
if ((expected) != _ret) { \
LOG(ERROR) << "[InferServer] Call [" << #func << "] failed, ret = " << _ret << ". " << msg; \
} \
} while (0)
#define CALL_ACL_FUNC(func, msg) _CALLFUNC(func, acllite::ACLLITE_OK, msg)
inline BufSurfaceMemType CastMemoryType(acllite::MemoryType type) noexcept{
switch (type) {
#define RETURN_MEMORY_TYPE(type) \
case acllite::MemoryType::type: \
return BUF_##type;
RETURN_MEMORY_TYPE(MEMORY_HOST)
RETURN_MEMORY_TYPE(MEMORY_DEVICE)
RETURN_MEMORY_TYPE(MEMORY_DVPP)
RETURN_MEMORY_TYPE(MEMORY_NORMAL)
#undef RETURN_MEMORY_TYPE
default:
LOG(ERROR) << "[InferServer] CastMemoryType(): Unsupported memory type";
return BUF_MEMORY_HOST;
}
}
inline acllite::MemoryType CastMemoryType(BufSurfaceMemType type) noexcept{
switch (type) {
#define RETURN_MEMORY_TYPE(type) \
case BUF_##type: \
return acllite::MemoryType::type;
RETURN_MEMORY_TYPE(MEMORY_HOST)
RETURN_MEMORY_TYPE(MEMORY_DEVICE)
RETURN_MEMORY_TYPE(MEMORY_DVPP)
RETURN_MEMORY_TYPE(MEMORY_NORMAL)
#undef RETURN_MEMORY_TYPE
default:
LOG(ERROR) << "[InferServer] CastMemoryType(): Unsupported memory type";
return acllite::MemoryType::MEMORY_HOST;
}
}
inline BufSurfaceColorFormat CastColorFmt(acllite::ImageFormat format) {
static std::map<acllite::ImageFormat, BufSurfaceColorFormat> color_map{
{ acllite::ImageFormat::YUV_SP_420, BUF_COLOR_FORMAT_NV12 },
{ acllite::ImageFormat::YVU_SP_420, BUF_COLOR_FORMAT_NV21 },
{ acllite::ImageFormat::RGB_888, BUF_COLOR_FORMAT_RGB },
{ acllite::ImageFormat::BGR_888, BUF_COLOR_FORMAT_BGR },
};
return color_map[format];
}
inline acllite::ImageFormat CastColorFmt(BufSurfaceColorFormat format) {
static std::map<BufSurfaceColorFormat, acllite::ImageFormat> color_map{
{ BUF_COLOR_FORMAT_NV12, acllite::ImageFormat::YUV_SP_420 },
{ BUF_COLOR_FORMAT_NV21, acllite::ImageFormat::YVU_SP_420 },
{ BUF_COLOR_FORMAT_RGB, acllite::ImageFormat::RGB_888 },
{ BUF_COLOR_FORMAT_BGR, acllite::ImageFormat::BGR_888 },
};
return color_map[format];
}
#endif // COMMON_UTILS_HPP_
上面这个文件内容目前是改成了华为acl相关的,现在把整个成功移植到英伟达的Jetson,那么这个文件的内容也要修改,另外,整个工程中所有用到了ACL_SAFECALL和CALL_ACL_FUNC的地方,不仅要把ACL_SAFECALL和CALL_ACL_FUNC这两个名字改掉,还要把用到这两个名字的硬件相关接口全都修改掉,比如
比如这张截图中,所有的这些acllite::AclLiteMalloc acllite::AclLiteFree acllite::AclLiteMemcpy这些接口都要相应的改成英伟达Jetson平台的接口。
2.1 infer_server/include/common/utils.hpp文件内容修改
/*************************************************************************
* Copyright (C) [2022] by Cambricon, Inc. All rights reserved
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*************************************************************************/
#ifndef COMMON_UTILS_HPP_
#define COMMON_UTILS_HPP_
#include <string>
#include <glog/logging.h>
#include "buf_surface.h"
#include "cuda_runtime.h"
#include "nvcv/ImageFormat.h"
#define _SAFECALL(func, expected, msg, ret_val) \
do { \
int _ret = (func); \
if ((expected) != _ret) { \
LOG(ERROR) << "[InferServer] Call [" << #func << "] failed, ret = " << _ret << ". " << msg; \
return (ret_val); \
} \
} while (0)
#define CUDA_SAFECALL(func, msg, ret_val) _SAFECALL(func, cudaSuccess, msg, ret_val)
#define _CALLFUNC(func, expected, msg) \
do { \
int _ret = (func); \
if ((expected) != _ret) { \
LOG(ERROR) << "[InferServer] Call [" << #func << "] failed, ret = " << _ret << ". " << msg; \
} \
} while (0)
#define CALL_CUDA_FUNC(func, msg) _CALLFUNC(func, cudaSuccess, msg)
inline BufSurfaceMemType CastMemoryType(cudaMemoryType type) noexcept{
switch (type) {
case cudaMemoryTypeUnregistered:
return BUF_MEMORY_UNREGISTERED;
case cudaMemoryTypeHost:
return BUF_MEMORY_HOST;
case cudaMemoryTypeDevice:
return BUF_MEMORY_DEVICE;
case cudaMemoryTypeManaged:
return BUF_MEMORY_MANAGED;
default:
LOG(ERROR) << "[InferServer] CastMemoryType(): Unsupported memory type";
return BUF_MEMORY_HOST;
}
}
inline cudaMemoryType CastMemoryType(BufSurfaceMemType type) noexcept{
switch (type) {
case BUF_MEMORY_UNREGISTERED:
return cudaMemoryTypeUnregistered;
case BUF_MEMORY_HOST:
return cudaMemoryTypeHost;
case BUF_MEMORY_DEVICE:
return cudaMemoryTypeDevice;
case BUF_MEMORY_MANAGED:
return cudaMemoryTypeManaged;
default:
LOG(ERROR) << "[InferServer] CastMemoryType(): Unsupported memory type";
return cudaMemoryTypeHost;
}
}
inline BufSurfaceColorFormat CastColorFmt(NVCVImageFormat format) {
static std::map<NVCVImageFormat, BufSurfaceColorFormat> color_map{
{ NVCV_IMAGE_FORMAT_NV12, BUF_COLOR_FORMAT_NV12 },
{ NVCV_IMAGE_FORMAT_NV21, BUF_COLOR_FORMAT_NV21 },
{ NVCV_IMAGE_FORMAT_RGB8, BUF_COLOR_FORMAT_RGB },
{ NVCV_IMAGE_FORMAT_BGR8, BUF_COLOR_FORMAT_BGR },
};
return color_map[format];
}
inline NVCVImageFormat CastColorFmt(BufSurfaceColorFormat format) {
static std::map<BufSurfaceColorFormat, NVCVImageFormat> color_map{
{ BUF_COLOR_FORMAT_NV12, NVCV_IMAGE_FORMAT_NV12 },
{ BUF_COLOR_FORMAT_NV21, NVCV_IMAGE_FORMAT_NV21 },
{ BUF_COLOR_FORMAT_RGB, NVCV_IMAGE_FORMAT_RGB8 },
{ BUF_COLOR_FORMAT_BGR, NVCV_IMAGE_FORMAT_BGR8 },
};
return color_map[format];
}
#endif // COMMON_UTILS_HPP_
2.2 cuda的四种内存
在CUDA编程中,内存类型是指定数据应该存储在哪种类型的内存中的关键概念。CUDA支持多种内存类型,每种类型都有其特定的用途和性能特点。以下是你提到的几种CUDA内存类型的解释和区别:
cudaMemoryTypeUnregistered
:
- 这个枚举值表示内存类型未注册。在CUDA中,通常不会直接使用这个值,因为它表示内存没有被明确指定为主机或设备内存。
cudaMemoryTypeHost
:
- 这表示内存是分配在主机(CPU)上的。主机内存可以被CPU直接访问,但GPU访问它通常较慢,因为需要通过PCIe总线进行数据传输。这种内存类型适用于需要CPU频繁访问的数据,或者在CPU和GPU之间需要频繁传输数据的场景。
cudaMemoryTypeDevice
:
- 这表示内存是分配在设备(GPU)上的。设备内存只能被GPU直接访问,CPU访问它需要通过CUDA的内存复制操作。这种内存类型适用于GPU计算密集型任务,因为数据已经在GPU上,可以减少数据传输的开销。
cudaMemoryTypeManaged
:
- 这表示内存是“统一内存”,即由CUDA统一管理的内存。在这种内存类型下,数据可以同时被CPU和GPU访问,而不需要显式的数据复制操作。CUDA运行时会自动处理数据在主机和设备之间的迁移。这种内存类型简化了内存管理,但可能会引入额外的性能开销,因为运行时需要决定何时以及如何迁移数据。
区别总结:
cudaMemoryTypeHost
和cudaMemoryTypeDevice
提供了明确的内存位置,分别对应CPU和GPU,适用于对性能有明确要求的场景。cudaMemoryTypeManaged
提供了更简单的编程模型,但可能会牺牲一些性能,因为数据迁移是自动和隐式的。cudaMemoryTypeUnregistered
通常不用于实际编程,它更多是一个占位符,表示内存类型尚未确定。
2.3 infer_server/src/core/device.cpp修改
/*************************************************************************
* Copyright (C) [2020] by Cambricon, Inc. All rights reserved
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*************************************************************************/
#include <glog/logging.h>
#include "nvis/infer_server.h"
namespace infer_server {
cudaMemcpyKind GetMemcpyKind(BufSurfaceMemType src_mem_type, BufSurfaceMemType dst_mem_type) {
// 确保未注册内存不会被使用
assert(src_mem_type != BUF_MEMORY_UNREGISTERED);
assert(dst_mem_type != BUF_MEMORY_UNREGISTERED);
// 根据源和目标内存类型确定 cudaMemcpyKind
if (src_mem_type == BUF_MEMORY_HOST && dst_mem_type == BUF_MEMORY_HOST) {
return cudaMemcpyHostToHost;
}
else if (src_mem_type == BUF_MEMORY_HOST && dst_mem_type == BUF_MEMORY_DEVICE) {
return cudaMemcpyHostToDevice;
}
else if (src_mem_type == BUF_MEMORY_DEVICE && dst_mem_type == BUF_MEMORY_HOST) {
return cudaMemcpyDeviceToHost;
}
else if (src_mem_type == BUF_MEMORY_DEVICE && dst_mem_type == BUF_MEMORY_DEVICE) {
return cudaMemcpyDeviceToDevice;
}
else if (src_mem_type == BUF_MEMORY_MANAGED || dst_mem_type == BUF_MEMORY_MANAGED) {
// 管理内存可以视为主机或设备内存
return cudaMemcpyDefault;
}
// 默认情况下返回 cudaMemcpyDefault
return cudaMemcpyDefault;
}
bool SetCurrentDevice(int device_id) noexcept{
CUDA_SAFECALL(cudaSetDevice(device_id), "Set device failed", false);
VLOG(3) << "[InferServer] SetCurrentDevice(): Set device [" << device_id << "] for this thread";
return true;
}
uint32_t TotalDeviceCount() noexcept{
uint32_t dev_cnt;
CUDA_SAFECALL(cudaGetDeviceCount(dev_cnt), "Set device failed", 0);
return dev_cnt;
}
bool CheckDevice(int device_id) noexcept{
uint32_t dev_cnt;
CUDA_SAFECALL(cudaGetDeviceCountdev_cnt), "Check device failed", false);
return device_id < static_cast<int>(dev_cnt) && device_id >= 0;
}
void* MallocDeviceMem(size_t size) noexcept{
void *device_ptr{};
CUDA_SAFECALL(cudaMallocManaged(&device_ptr, size), "Malloc device memory failed", nullptr);
return device_ptr;
}
int FreeDeviceMem(void *p) noexcept{
CUDA_SAFECALL(cudaFree(p), "Free device memory failed", -1);
return 0;
}
void* AllocHostMem(size_t size) noexcept{
void *host_ptr{};
CUDA_SAFECALL(cudaMallocHost(&host_ptr, size), "Malloc host memory failed", nullptr);
return host_ptr;
}
int FreeHostMem(void *p) noexcept{
CUDA_SAFECALL(cudaFreeHost(p), "Free host memory failed", -1);
}
int MemcpyHD(void* dst, BufSurfaceMemType dst_mem_type, void* src, BufSurfaceMemType src_mem_type, size_t size) noexcept{
cudaMemcpyKind cpy_type = GetMemcpyKind(src_mem_type, dst_mem_type);
CUDA_SAFECALL(cudaMemcpy(dst, src, size, cpy_type), "Memcpy HD failed", -1);
return 0;
}
bool IsItegratedGPU(int device_id) {
static int s_integrated = [device_id]() {
cudaDeviceProp prop;
CUDA_SAFECALL(cudaGetDeviceProperties(&prop, device_id));
};
return s_integrated == 1;
}
int GetCurrentDevice(int& device_id) noexcept{
CUDA_SAFECALL(cudaGetDevice(&device_id));
return 0;
}
} // namespace infer_server
3 图像缩放、裁剪、色域转换等代码编写----利用CV-CUDA
3.1 nvstream中图像处理代码流程框架
和前面一样,先大体看一下nvstream中图像处理的代码流程框架。
3.1.1 类的层次关系
class TransformService
class ITransformer 只是个基类,被用来继承的,
class TransformerAcl : public ITransformer 然后就是硬件处理的了
AclLiteImageProc
现在应该是直接把TransformerAcl 和 AclLiteImageProc合成一个英伟达的类。
3.1.2 各个类的初始化函数调用层次关系
在视频解码类的create函数里面,或者在算法的Preproc都会有这样一行
if (TransformCreate(&transformer_, &config) != 0)这个transformer_ 是视频解码类或者算法预处理类的一个成员。
这个create函数是这样的,
int TransformCreate(void **transformer, TransformConfigParams *params) {
return infer_server::TransformService::Instance().Create(transformer, params);
}
然后TransformService类的create函数里面有
ITransformer *transformer_ = CreateTransformer(); CreateTransformer里面就是new TransformerAcl也就是具体的硬件处理类了
transformer_->Create(params)
*transformer = transformer_;回传给解码的那个类或者算法预处理类,
3.1.3 各个类的transform函数调用层次关系
具体硬件解码处理类的OnFrame函数里面会有一个
Transform(transformer_, &transform_src, surf, &trans_params)
应该是调用了这个
int Transform(void *transformer, BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
return infer_server::TransformService::Instance().Transform(transformer, src, dst, transform_params);
}
然后到了这里
int Transform(void *transformer, BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
if (!dst || !src || !transform_params) {
LOG(ERROR) << "[InferServer] [TransformService] Transform(): src, dst BufSurface or parameters pointer is invalid";
return -1;
}
ITransformer *transformer_ = static_cast<ITransformer *>(transformer); 那个具体的硬件处理类就是继承的ITransformer
return transformer_->Transform(src, dst, transform_params);这就到了具体硬件transform类的函数了,
}
3.2 编写图像缩放、裁剪、色域转换等代码
3.2.1 infer_server/src/nv/transform_impl_nv.hpp
#ifndef TRANSFORM_IMPL_NV_HPP_
#define TRANSFORM_IMPL_NV_HPP_
#include <algorithm>
#include <cstring> // for memset
#include <atomic>
#include "../transform_impl.hpp"
#include "transform.h"
namespace infer_server {
class TransformerNV : public ITransformer {
public:
TransformerNV() {
}
~TransformerNV() = default;
int Create(TransformConfigParams *params) override;
int Destroy() override;
int Transform(BufSurface *src, BufSurface *dst, TransformParams *transform_params) override;
private:
int DoNVTransform(BufSurface *src, BufSurface *dst, TransformParams *transform_params);
int NVResize(BufSurface *src, BufSurface *dst, TransformParams *transform_params);
int NVCrop(BufSurface *src, BufSurface *dst, TransformParams *transform_params);
int NVCropResize(BufSurface *src, BufSurface *dst, TransformParams *transform_params);
int NVCropResizePaste(BufSurface *src, BufSurface *dst, TransformParams *transform_params);
int NVConvertFormat(BufSurface *src, BufSurface *dst, TransformParams *transform_params);
private:
TransformConfigParams create_params_;
cudaStream_t* cu_stream_{nullptr};
std::shared_ptr<nvcv::ITensor> crop_tensor_;
std::shared_ptr<nvcv::ITensor> resized_tensor_;
std::shared_ptr<nvcv::ITensor> cvtcolor_tensor_;
std::shared_ptr<nvcv::ITensor> copymakeborder_tensor_;
std::shared_ptr<cvcuda::CustomCrop> crop_op_;
std::shared_ptr<cvcuda::Resize> resize_op_;
std::shared_ptr<cvcuda::CvtColor> cvtcolor_op_;
std::shared_ptr<cvcuda::CopyMakeBorder> copymakeborder_op_;
std::atomic<bool> created_{};
};
} // namespace
#endif // TRANSFORM_IMPL_MLU370_HPP_
3.2.2 infer_server/src/nv/transform_impl_nv.cpp
#include "transform_impl_nv.hpp"
#include <algorithm>
#include <atomic>
#include <cstring> // for memset
#include <map>
#include <memory>
#include <string>
#include <vector>
#include "glog/logging.h"
namespace infer_server {
int TransformerNV::Create(TransformConfigParams *params) {
create_params_ = *params;
if(nullptr == cu_stream_)
{
cuCreateStream(&cu_stream_, create_params_.device_id);
}
if (!crop_op_){
crop_op_ = std::make_shared<cvcuda::CustomCrop>();
}
if (!cvtcolor_op_){
cvtcolor_op_ = std::make_shared<cvcuda::CvtColor>();
}
if (!resize_op_){
resize_op_ = std::make_shared<cvcuda::Resize>();
}
if (!copymakeborder_op_){
copymakeborder_op_ = std::make_shared<cvcuda::CopyMakeBorder>();
}
created_ = true;
return 0;
}
int TransformerNV::Destroy() {
if (!created_) {
LOG(WARNING) << "[InferServer] [TransformerNV] Destroy(): Transformer is not created";
return 0;
}
if (cu_stream_ != nullptr) {
cuDestroyStream(cu_stream_);
cu_stream_ = nullptr;
}
cvtcolor_op_.reset();
resize_op_.reset();
crop_op_.reset();
copymakeborder_op_.reset();
return 0;
}
int TransformerNV::Transform(BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
if (!created_) {
LOG(ERROR) << "[InferServer] [TransformerNV] Transform(): transformer is not created";
return -1;
}
if (src->num_filled > dst->batch_size) {
LOG(ERROR) << "[InferServer] [TransformerNV] Transform(): The number of inputs exceeds batch size: "
<< src->num_filled << " v.s. " << dst->batch_size;
return -1;
}
if (src->device_id != dst->device_id || src->device_id != create_params_.device_id) {
LOG(ERROR) << "[InferServer] [TransformerNV] Transform(): The device id of src, dst and transformer is not the same: src device: " << src->device_id
<< " , dst device: " << dst->device_id
<< " , transformer device: " << create_params_.device_id;
return -1;
}
if (src->mem_type != BUF_MEMORY_DVPP) {
LOG(ERROR) << "[InferServer] [TransformerNV] Transform(): The src and dst mem_type must be BUF_MEMORY_DVPP";
return -1;
}
if (src->surface_list[0].data_size == 0) {
LOG(ERROR) << "[InferServer] [TransformerNV] Transform(): Input data size is 0";
return -1;
}
return DoNVTransform(src, dst, transform_params);
}
int TransformerNV::DoNVTransform(BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
switch (transform_params->transform_flag) {
case TRANSFORM_RESIZE_SRC:
return NVResize(src, dst, transform_params);
break;
case TRANSFORM_CROP_SRC:
return NVCrop(src, dst, transform_params);
break;
case TRANSFORM_CROP_RESIZE_SRC:
return NVCropResize(src, dst, transform_params);
break;
case TRANSFORM_CROP_RESIZE_PASTE_SRC:
return NVCropResizePaste(src, dst, transform_params);
break;
default:
LOG(ERROR) << "[InferServer] [TransformerNV] Transform(): Transform flag not supported currently, flag = " << transform_params->transform_flag;
return -1;
}
}
int TransformerNV::NVResize(BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
if (src->batch_size != 1 || dst->batch_size != 1) {
LOG(ERROR) << "[InferServer] [TransformerNV] Transform(): NVResize now only support src/dst one image, src batch size = " << src->batch_size <<" , dst batch size = " << dst->batch_size;
return -1;
}
auto& src_surf = src->surface_list[0];
nvcv::Tensor::Requirements in_reqs = nvcv::Tensor::CalcRequirements(1, {src_surf.width_stride, src_surf.height}, nvcv::FMT_BGR8);
nvcv::TensorDataStridedCuda::Buffer in_buf;
std::copy(in_reqs.strides, in_reqs.strides + NVCV_TENSOR_MAX_RANK, in_buf.strides);
in_buf.basePtr = reinterpret_cast<NVCVByte *>(src->data_ptr);
nvcv::TensorDataStridedCuda in_data(nvcv::TensorShape{in_reqs.shape, in_reqs.rank, in_reqs.layout}, nvcv::DataType{in_reqs.dtype}, in_buf);
nvcv::TensorWrapData in_tensor(in_data);
if (!resized_tensor_){
resized_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, {dst.width_stride, src_surf.height}, nvcv::FMT_BGR8));
}
(*resize_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), in_tensor, *resized_tensor_, NVCV_INTERP_LINEAR);
auto& dst_surf = dst->surface_list[0];
auto out_data = resized_tensor_->exportData<nvcv::TensorDataStridedCuda>();
cudaMemcpyAsync(dst_surf.data_ptr, (const unsigned char *)out_data->basePtr(), dst_surf.data_size, cudaMemcpyDeviceToHost);
cuStreamSynchronize(cu_stream_);
resized_tensor_.reset();
return 0;
}
int TransformerNV::NVCrop(BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
auto& src_surf = src->surface_list[0];
auto& dst_surf = dst->surface_list[0];
nvcv::Tensor::Requirements in_reqs = nvcv::Tensor::CalcRequirements(1, {src_surf.width_stride, src_surf.height}, nvcv::FMT_BGR8);
nvcv::TensorDataStridedCuda::Buffer in_buf;
std::copy(in_reqs.strides, in_reqs.strides + NVCV_TENSOR_MAX_RANK, in_buf.strides);
in_buf.basePtr = reinterpret_cast<NVCVByte *>(src->data_ptr);
nvcv::TensorDataStridedCuda in_data(nvcv::TensorShape{in_reqs.shape, in_reqs.rank, in_reqs.layout}, nvcv::DataType{in_reqs.dtype}, in_buf);
nvcv::TensorWrapData in_tensor(in_data);
if (!crop_tensor_){
crop_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, {dst_surf.width, dst_surf.height}, nvcv::FMT_BGR8));
}
TransformRect rect = transform_params->src_rect[0];
rect.left = rect.left >= src_surf.width ? 0 : rect.left;
rect.top = rect.top >= src_surf.height ? 0 : rect.top;
rect.width = rect.width <= 0 ? (src_surf.width - rect.left) : rect.width;
rect.height = rect.height <= 0 ? (src_surf.height - rect.top) : rect.height;
NVCVRectI crpRect = { rect.left, rect.top, rect.left + rect.width - 1, rect.top + rect.height - 1 };
(*crop_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), in_tensor, *crop_tensor_, crpRect);
auto out_data = crop_tensor_->exportData<nvcv::TensorDataStridedCuda>();
cudaMemcpyAsync(dst_surf.data_ptr, (const unsigned char *)out_data->basePtr(), dst_surf.data_size, cudaMemcpyDeviceToHost);
cuStreamSynchronize(cu_stream_);
crop_tensor_.reset();
return 0;
}
int TransformerNV::NVCropResize(BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
auto& src_surf = src->surface_list[0];
auto& dst_surf = dst->surface_list[0];
nvcv::Tensor::Requirements in_reqs = nvcv::Tensor::CalcRequirements(1, {src_surf.width_stride, src_surf.height}, nvcv::FMT_BGR8);
nvcv::TensorDataStridedCuda::Buffer in_buf;
std::copy(in_reqs.strides, in_reqs.strides + NVCV_TENSOR_MAX_RANK, in_buf.strides);
in_buf.basePtr = reinterpret_cast<NVCVByte *>(src->data_ptr);
nvcv::TensorDataStridedCuda in_data(nvcv::TensorShape{in_reqs.shape, in_reqs.rank, in_reqs.layout}, nvcv::DataType{in_reqs.dtype}, in_buf);
nvcv::TensorWrapData in_tensor(in_data);
TransformRect rect = transform_params->src_rect[0];
rect.left = rect.left >= src_surf.width ? 0 : rect.left;
rect.top = rect.top >= src_surf.height ? 0 : rect.top;
rect.width = rect.width <= 0 ? (src_surf.width - rect.left) : rect.width;
rect.height = rect.height <= 0 ? (src_surf.height - rect.top) : rect.height;
NVCVRectI crpRect = { rect.left, rect.top, rect.left + rect.width - 1, rect.top + rect.height - 1 };
if (!crop_tensor_){
crop_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, {rect.width, rect.height}, nvcv::FMT_BGR8));
}
(*crop_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), in_tensor, *crop_tensor_, crpRect);
if (!resized_tensor_){
resized_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, { dst_surf.width, dst_surf.height }, nvcv::FMT_BGR8));
}
(*resize_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), *crop_tensor_, *resized_tensor_, NVCV_INTERP_LINEAR);
auto out_data = resized_tensor_->exportData<nvcv::TensorDataStridedCuda>();
cudaMemcpyAsync(dst_surf.data_ptr, (const unsigned char *)out_data->basePtr(), dst_surf.data_size, cudaMemcpyDeviceToHost);
cuStreamSynchronize(cu_stream_);
crop_tensor_.reset();
resized_tensor_.reset();
return 0;
}
int TransformerNV::NVCropResizePaste(BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
auto& src_surf = src->surface_list[0];
auto& dst_surf = dst->surface_list[0];
nvcv::Tensor::Requirements in_reqs = nvcv::Tensor::CalcRequirements(1, {src_surf.width_stride, src_surf.height}, nvcv::FMT_BGR8);
nvcv::TensorDataStridedCuda::Buffer in_buf;
std::copy(in_reqs.strides, in_reqs.strides + NVCV_TENSOR_MAX_RANK, in_buf.strides);
in_buf.basePtr = reinterpret_cast<NVCVByte *>(src->data_ptr);
nvcv::TensorDataStridedCuda in_data(nvcv::TensorShape{in_reqs.shape, in_reqs.rank, in_reqs.layout}, nvcv::DataType{in_reqs.dtype}, in_buf);
nvcv::TensorWrapData in_tensor(in_data);
TransformRect rect = transform_params->src_rect[0];
rect.left = rect.left >= src_surf.width ? 0 : rect.left;
rect.top = rect.top >= src_surf.height ? 0 : rect.top;
rect.width = rect.width <= 0 ? (src_surf.width - rect.left) : rect.width;
rect.height = rect.height <= 0 ? (src_surf.height - rect.top) : rect.height;
NVCVRectI crop_src_rect = { rect.left, rect.top, rect.left + rect.width - 1, rect.top + rect.height - 1 };
TransformRect dst_rect = transform_params->dst_rect[0];
dst_rect.left = dst_rect.left >= src_surf.width ? 0 : dst_rect.left;
dst_rect.top = dst_rect.top >= src_surf.height ? 0 : dst_rect.top;
dst_rect.width = dst_rect.width <= 0 ? (src_surf.width - dst_rect.left) : dst_rect.width;
dst_rect.height = dst_rect.height <= 0 ? (src_surf.height - dst_rect.top) : dst_rect.height;
NVCVRectI paste_dst_rect(dst_rect.left, dst_rect.top, dst_rect.left + dst_rect.width - 1, dst_rect.top + dst_rect.height - 1);
if (!crop_tensor_){
crop_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, {rect.width, rect.height}, nvcv::FMT_BGR8));
}
(*crop_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), in_tensor, *crop_tensor_, crop_src_rect);
if (!resized_tensor_){
resized_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, { dst_rect.width, dst_rect.height }, nvcv::FMT_BGR8));
}
(*resize_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), *crop_tensor_, *resized_tensor_, NVCV_INTERP_LINEAR);
if (!copymakeborder_tensor_){
copymakeborder_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, { dst_surf.width, dst_surf.height }, nvcv::FMT_BGR8));
}
(*copymakeborder_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), *resized_tensor_, *copymakeborder_tensor_, dst_rect.top, dst_rect.left, NVCV_BORDER_CONSTANT, 0);
auto out_data = copymakeborder_tensor_->exportData<nvcv::TensorDataStridedCuda>();
cudaMemcpyAsync(dst_surf.data_ptr, (const unsigned char *)out_data->basePtr(), dst_surf.data_size, cudaMemcpyDeviceToHost);
cuStreamSynchronize(cu_stream_);
crop_tensor_.reset();
resized_tensor_.reset();
copymakeborder_tensor_.reset();
return 0;
}
int TransformerNV::NVConvertFormat(BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
auto& src_surf = src->surface_list[0];
nvcv::Tensor::Requirements in_reqs = nvcv::Tensor::CalcRequirements(1, {src_surf.width_stride, src_surf.height}, nvcv::FMT_NV12);
nvcv::TensorDataStridedCuda::Buffer in_buf;
std::copy(in_reqs.strides, in_reqs.strides + NVCV_TENSOR_MAX_RANK, in_buf.strides);
in_buf.basePtr = reinterpret_cast<NVCVByte *>(src->data_ptr);
nvcv::TensorDataStridedCuda in_data(nvcv::TensorShape{in_reqs.shape, in_reqs.rank, in_reqs.layout}, nvcv::DataType{in_reqs.dtype}, in_buf);
nvcv::TensorWrapData in_tensor(in_data);
if (!cvtcolor_tensor_){
cvtcolor_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, {src_surf.width_stride, src_surf.height}, nvcv::FMT_BGR8));
}
(*cvtcolor_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), in_tensor, *cvtcolor_tensor_, NVCV_COLOR_YUV2BGR_NV12);
auto& dst_surf = dst->surface_list[0];
auto out_data = resized_tensor_->exportData<nvcv::TensorDataStridedCuda>();
cudaMemcpyAsync(dst_surf.data_ptr, (const unsigned char *)out_data->basePtr(), dst_surf.data_size, cudaMemcpyDeviceToHost);
cuStreamSynchronize(cu_stream_);
cvtcolor_tensor_.reset();
return 0;
}
} // namespace
4 算法推理相关代码修改
4.1 ./infer_server/src/model/model.h
/*************************************************************************
* Copyright (C) [2020] by Cambricon, Inc. All rights reserved
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*************************************************************************/
#ifndef INFER_SERVER_MODEL_H_
#define INFER_SERVER_MODEL_H_
#include <glog/logging.h>
#include <algorithm>
#include <map>
#include <memory>
#include <mutex>
#include <string>
#include <unordered_map>
#include <vector>
#include "nvis/infer_server.h"
#include "nvis/processor.h"
#include "nvis/shape.h"
namespace trt {
#include "trteng_exp/export_funtions.h"
}
namespace infer_server {
using TEngine = std::unique_ptr<trt::trtnet_t>;
class Model;
class ModelRunner {
public:
explicit ModelRunner(int device_id) : device_id_(device_id) {}
ModelRunner(const ModelRunner& other) = delete;
ModelRunner& operator=(const ModelRunner& other) = delete;
ModelRunner(ModelRunner&& other) = default;
ModelRunner& operator=(ModelRunner&& other) = default;
~ModelRunner() = default;
bool Init(TEngine engine) noexcept;
Status Run(ModelIO* input, ModelIO* output) noexcept; // NOLINT
void SetInputNum(uint32_t i_num) noexcept{ input_num_ = i_num; }
void SetOutputNum(uint32_t o_num) noexcept{ output_num_ = o_num; }
void SetInputShapes(std::vector<Shape> const& i_shapes) noexcept{ i_shapes_ = i_shapes; }
void SetOutputShapes(std::vector<Shape> const& o_shapes) noexcept{ o_shapes_ = o_shapes; }
void SetInputLayouts(std::vector<DataLayout> const& i_layouts) noexcept{ i_layouts_ = i_layouts; }
void SetOutputLayouts(std::vector<DataLayout> const& o_layouts) noexcept{ o_layouts_ = o_layouts; }
private:
TEngine engine_{};
uint32_t input_num_{};
uint32_t output_num_{};
std::vector<Shape> i_shapes_;
std::vector<Shape> o_shapes_;
std::vector<DataLayout> i_layouts_;
std::vector<DataLayout> o_layouts_;
int device_id_{};
}; // class RuntimeContext
class Model : public ModelInfo {
public:
Model() = default;
bool Init(const std::string& model_path) noexcept;
~Model();
bool HasInit() const noexcept{ return has_init_; }
const Shape& InputShape(int index) const noexcept override{
CHECK(index < i_num_ || index >= 0) << "[InferServer] [Model] Input shape index overflow";
return input_shapes_[index];
}
const Shape& OutputShape(int index) const noexcept override{
CHECK(index < o_num_ || index >= 0) << "[InferServer] [Model] Output shape index overflow";
return output_shapes_[index];
}
const DataLayout& InputLayout(int index) const noexcept override{
CHECK(index < i_num_ || index >= 0) << "[InferServer] [Model] Input shape index overflow";
return i_layouts_[index];
}
const DataLayout& OutputLayout(int index) const noexcept override{
CHECK(index < o_num_ || index >= 0) << "[InferServer] [Model] Input shape index overflow";
return o_layouts_[index];
}
uint32_t InputNum() const noexcept override{ return i_num_; }
uint32_t OutputNum() const noexcept override{ return o_num_; }
uint32_t BatchSize() const noexcept override{ return model_batch_size_; }
bool FixedOutputShape() noexcept override{ return FixedShape(output_shapes_); }
std::shared_ptr<ModelRunner> GetRunner(int device_id) noexcept{
trt::ErrInfo ei{};
trt::trtnet_t* net = trt::load_net_from_file(model_file_.data(), &ei);
CHECK(!net) << "trt::load_net_from_file failed: model file: " << te2fullpath << ", error: " << ei.errmsg;
TEngine engine = TEngine(net, [](trt::trtnet_t* n) { trt::release_net(n); });
auto runner = std::make_shared<ModelRunner>(device_id);
runner->SetInputNum(i_num_);
runner->SetOutputNum(o_num_);
runner->SetInputShapes(input_shapes_);
runner->SetOutputShapes(output_shapes_);
runner->SetInputLayouts(i_layouts_);
runner->SetOutputLayouts(o_layouts_);
if (!runner->Init(std::move(engine))) return nullptr;
return runner;
}
std::string GetKey() const noexcept override{ return model_file_; }
private:
bool GetModelInfo(trt::trtnet_t* net) noexcept;
bool FixedShape(const std::vector<Shape>& shapes) noexcept{
for (auto &shape : shapes) {
auto vectorized_shape = shape.Vectorize();
if (!std::all_of(vectorized_shape.begin(), vectorized_shape.end(), [](int64_t v) { return v > 0; })) {
return false;
}
}
return !shapes.empty();
}
Model(const Model&) = delete;
Model& operator=(const Model&) = delete;
private:
std::string model_file_;
std::vector<DataLayout> i_layouts_, o_layouts_;
std::vector<Shape> input_shapes_, output_shapes_;
int i_num_{}, o_num_{};
uint32_t model_batch_size_{ 1 };
bool has_init_{ false };
}; // class Model
// use environment CNIS_MODEL_CACHE_LIMIT to control cache limit
class ModelManager {
public:
static ModelManager* Instance() noexcept;
void SetModelDir(const std::string& model_dir) noexcept{ model_dir_ = model_dir; }
ModelPtr Load(const std::string& model_file) noexcept;
ModelPtr Load(void* mem_ptr, size_t size) noexcept;
bool Unload(ModelPtr model) noexcept;
void ClearCache() noexcept;
int CacheSize() noexcept;
std::shared_ptr<Model> GetModel(const std::string& name) noexcept;
private:
std::string DownloadModel(const std::string& url) noexcept;
void CheckAndCleanCache() noexcept;
std::string model_dir_{ "." };
static std::unordered_map<std::string, std::shared_ptr<Model>> model_cache_;
static std::mutex model_cache_mutex_;
}; // class ModelManager
} // namespace infer_server
#endif // INFER_SERVER_MODEL_H_
4.2 ./infer_server/src/model/model.cpp
/*************************************************************************
* Copyright (C) [2020] by Cambricon, Inc. All rights reserved
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*************************************************************************/
#include "model.h"
#include <glog/logging.h>
#include <algorithm>
#include <memory>
#include <string>
#include <utility>
#include <vector>
#include "core/data_type.h"
#include "common/utils.hpp"
using std::string;
using std::vector;
namespace infer_server {
bool ModelRunner::Init(TEngine engine) noexcept{
if (engine == nullptr) return false;
engine_ = std::move(engine);
return true;
}
Status ModelRunner::Run(ModelIO* in, ModelIO* out) noexcept{ // NOLINT
auto& input = in->surfs;
auto& output = out->surfs;
CHECK_EQ(input_num_, input.size()) << "[InferServer] [ModelRunner] Input number is mismatched";
VLOG(5) << "[InferServer] [ModelRunner] Process inference once, input num: " << input_num_ << " output num: "
<< output_num_;
std::vector<trt::NetInoutLayerData> inputData(input_num_);
std::vector<trt::NetInoutLayerData> outputData(output_num_);
for (uint32_t i_idx = 0; i_idx < input_num_; ++i_idx) {
CHECK_EQ(input_num_, input.size()) << "[InferServer] [ModelRunner] Input number is mismatched";
inputData[i].data = input[i_idx]->GetData(0);
inputData[i].size = input[i_idx]->GetSize(0);
inputData[i].layer_idx = i_idx;
}
for (uint32_t o_idx = 0; o_idx < output_num_; ++o_idx) {
CHECK_EQ(output_num_, output.size()) << "[InferServer] [ModelRunner] Output number is mismatched";
outputData[i].data = input[o_idx]->GetData(0);
outputData[i].size = input[o_idx]->GetSize(0);
outputData[i].layer_idx = o_idx;
}
uint32_t batchsize = input[0]->GetNumFilled();
trt::ErrInfo ei{};
_SAFECALL(trt::net_do_inference(engine_.get(), batchsize, inputData.data(), inputData.size(), outputData.data(), outputData.size(), &ei)
, "[InferServer] [ModelRunner] Infer failed.", Status::ERROR_BACKEND);
return Status::SUCCESS;
}
bool Model::Init(const string& model_file) noexcept{
model_file_ = model_file;
trt::ErrInfo ei{};
trt::trtnet_t* net = trt::load_net_from_file(model_file_.data(), &ei);
CHECK(!net) << "trt::load_net_from_file failed: model file: " << te2fullpath << ", error: " << ei.errmsg;
has_init_ = GetModelInfo(model);
VLOG(1) << "[InferServer] [Model] (success) Load model from file: " << model_file_;
trt::release_net(net);
return has_init_;
}
bool Model::GetModelInfo(trt::trtnet_t* net) noexcept{
VLOG(1) << "[InferServer] [Model] (success) Load model from graph file: " << model_file_;
int model_batch_size_ = trt::net_max_batch_size(net);
// get IO messages
// get io number and data size
i_num_ = trt::net_num_inputs(net);
o_num_ = trt::net_num_outputs(net);
// get input info
for (int i = 0; i < ninp; i++) {
trt::LayerDims ldim{};
trt::net_input_layer_dims(net, i, &ldim);
input_shapes_.emplace_back(std::move(Shape({ std::max(ldim.n, model_batch_size_), ldim.c, ldim.h, ldim.w })));
DataLayout layout;
layout.dtype = DataType::FLOAT;
layout.order = DimOrder::NCHW;
i_layouts_.emplace_back(std::move(layout));
}
// get output info
int noup = net_num_outputs(net);
runner->SetOutputNum(noup);
for (int i = 0; i < noup; i++) {
trt::LayerDims ldim{};
trt::net_output_layer_dims(net, i, &ldim);
output_shapes_.emplace_back(std::move(Shape({ std::max(ldim.n, model_batch_size_), ldim.c, ldim.h, ldim.w })));
DataLayout layout;
layout.dtype = DataType::FLOAT;
layout.order = DimOrder::NCHW;
o_layouts_.emplace_back(std::move(layout));
}
VLOG(1) << "[InferServer] [Model] Model Info: input number = " << i_num_ << ";\toutput number = " << o_num_;
VLOG(1) << "[InferServer] [Model] batch size = " << model_batch_size_;
for (int i = 0; i < i_num_; ++i) {
VLOG(1) << "[InferServer] [Model] ----- input index [" << i;
VLOG(1) << "[InferServer] [Model] data type " << detail::DataTypeStr(i_layouts_[i].dtype);
VLOG(1) << "[InferServer] [Model] dim order " << detail::DimOrderStr(i_layouts_[i].order);
VLOG(1) << "[InferServer] [Model] shape " << input_shapes_[i];
}
for (int i = 0; i < o_num_; ++i) {
VLOG(1) << "[InferServer] [Model] ----- output index [" << i;
VLOG(1) << "[InferServer] [Model] data type " << detail::DataTypeStr(o_layouts_[i].dtype);
VLOG(1) << "[InferServer] [Model] dim order " << detail::DimOrderStr(o_layouts_[i].order);
VLOG(1) << "[InferServer] [Model] shape " << output_shapes_[i];
}
return true;
}
Model::~Model() {
VLOG(1) << "[InferServer] [Model] Unload model: " << model_file_;
}
} // namespace infer_server
5 其他代码修改
其他的还有很多零碎代码,比如一些命名空间的名字,还有一些其他名称,还有很多文件里面调用的一些函数的名字,太乱了,不写到博客里面了。
参考文献:
在NVIDIA Jetson AGX Orin中使用jetson-ffmpeg调用硬件编解码加速处理-CSDN博客
NVIDIA Jetson AGX Orin源码编译安装CV-CUDA-CSDN博客
GitHub - Cambricon/CNStream: CNStream is a streaming framework for building Cambricon machine learning pipelines http://forum.cambricon.com https://gitee.com/SolutionSDK/CNStream
easydk/samples/simple_demo/common/video_decoder.cpp at master · Cambricon/easydk · GitHub
aclStream流处理多路并发Pipeline框架中 视频解码 代码调用流程整理、类的层次关系整理、回调函数赋值和调用流程整理-CSDN博客
aclStream流处理多路并发Pipeline框架中VEncode Module代码调用流程整理、类的层次关系整理、回调函数赋值和调用流程整理-CSDN博客
FFmpeg/doc/examples at master · FFmpeg/FFmpeg · GitHub
GitHub - CVCUDA/CV-CUDA: CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.
如何使用FFmpeg的解码器—FFmpeg API教程 · FFmpeg原理
C++ API — CV-CUDA Beta documentation (cvcuda.github.io)
CV-CUDA/tests/cvcuda/system at main · CVCUDA/CV-CUDA · GitHub
Resize — CV-CUDA Beta documentation
CUDA Runtime API :: CUDA Toolkit Documentation
CUDA Toolkit Documentation 12.6 Update 1