DB算法原理与构建

news2024/9/28 17:33:14

参考:
https://aistudio.baidu.com/projectdetail/4483048

Real-Time Scene Text Detection with Differentiable Binarization

如何读论文-by 李沐

DB (Real-Time Scene Text Detection with Differentiable Binarization)

原理

DB是一个基于分割的文本检测算法,其提出的可微分阈值,采用动态的阈值区分文本区域与背景
在这里插入图片描述
基于分割的普通文本检测算法,流程如上图蓝色箭头所示,得到分割结果后采用固定的阈值(标准二值化不可微,导致网络无法端到端训练)得到二值化的分割图,之后采用诸如像素聚类的启发式算法得到文本区域。

DB算法的流程如图中红色箭头所示,最大的不同在于DB有一个阈值图,通过网络去预测图片每个位置处的阈值,而不是采用一个固定的值,更好的分离文本背景与前景。

优势:
1.算法结构简单,无需繁琐的后处理
2.开源数据上拥有良好的精度和性能

DB算法提出了可微二值化,可微二值化将标准二值化中的阶跃函数进行了近似,使用如下公式进行代替:

在这里插入图片描述
在这里插入图片描述
DB算法整体结构:
在这里插入图片描述
输入的图像经过网络Backbone和FPN提取特征,提取后的特征级联在一起,得到原图四分之一大小的特征,然后利用卷积层分别得到文本区域预测概率图和阈值图,进而通过DB的后处理得到文本包围曲线。

DB文本检测模型构建

DB文本检测模型可以分为三个部分:

Backbone网络,负责提取图像的特征
FPN网络,特征金字塔结构增强特征
Head网络,计算文本区域概率图

backbone网络:论文中使用了ResNet50,本节实验中,为了加快训练速度,采用MobileNetV3 large结构作为backbone。

DB的Backbone用于提取图像的多尺度特征,如下代码所示,假设输入的形状为[640, 640],backbone网络的输出有四个特征,其形状分别是 [1, 16, 160, 160],[1, 24, 80, 80], [1, 56, 40, 40],[1, 480, 20, 20]。 这些特征将输入给特征金字塔FPN网络进一步的增强特征。

import paddle 
from ppocr.modeling.backbones.det_mobilenet_v3 import MobileNetV3

fake_inputs = paddle.randn([1, 3, 640, 640], dtype="float32")

# 1. 声明Backbone
model_backbone = MobileNetV3()
model_backbone.eval()

# 2. 执行预测
outs = model_backbone(fake_inputs)

# 3. 打印网络结构
# print(model_backbone)

# 4. 打印输出特征形状
for idx, out in enumerate(outs):
    print("The index is ", idx, "and the shape of output is ", out.shape)

FPN网络

特征金字塔结构FPN是一种卷积网络来高效提取图片中各维度特征的常用方法。
FPN网络的输入为Backbone部分的输出,输出特征图的高度和宽度为原图的四分之一。假设输入图像的形状为[1, 3, 640, 640],FPN输出特征的高度和宽度为[160, 160]

 import paddle
from paddle import nn
import paddle.nn.functional as F
from paddle import ParamAttr

class DBFPN(nn.Layer):
    def __init__(self, in_channels, out_channels, **kwargs):
        super(DBFPN, self).__init__()
        self.out_channels = out_channels

        # DBFPN详细实现参考: https://github.com/PaddlePaddle/PaddleOCRblob/release%2F2.4/ppocr/modeling/necks/db_fpn.py

    def forward(self, x):
        c2, c3, c4, c5 = x

        in5 = self.in5_conv(c5)
        in4 = self.in4_conv(c4)
        in3 = self.in3_conv(c3)
        in2 = self.in2_conv(c2)

        # 特征上采样
        out4 = in4 + F.upsample(
            in5, scale_factor=2, mode="nearest", align_mode=1)  # 1/16
        out3 = in3 + F.upsample(
            out4, scale_factor=2, mode="nearest", align_mode=1)  # 1/8
        out2 = in2 + F.upsample(
            out3, scale_factor=2, mode="nearest", align_mode=1)  # 1/4

        p5 = self.p5_conv(in5)
        p4 = self.p4_conv(out4)
        p3 = self.p3_conv(out3)
        p2 = self.p2_conv(out2)

        # 特征上采样
        p5 = F.upsample(p5, scale_factor=8, mode="nearest", align_mode=1)
        p4 = F.upsample(p4, scale_factor=4, mode="nearest", align_mode=1)
        p3 = F.upsample(p3, scale_factor=2, mode="nearest", align_mode=1)

        fuse = paddle.concat([p5, p4, p3, p2], axis=1)
        return fuse

Head网络

计算文本区域概率图,文本区域阈值图以及文本区域二值图。
DB Head网络会在FPN特征的基础上作上采样,将FPN特征由原图的四分之一大小映射到原图大小。


import math
import paddle
from paddle import nn
import paddle.nn.functional as F
from paddle import ParamAttr

class DBHead(nn.Layer):
    """
    Differentiable Binarization (DB) for text detection:
        see https://arxiv.org/abs/1911.08947
    args:
        params(dict): super parameters for build DB network
    """

    def __init__(self, in_channels, k=50, **kwargs):
        super(DBHead, self).__init__()
        self.k = k

        # DBHead详细实现参考 https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.4/ppocr/modeling/heads/det_db_head.py

    def step_function(self, x, y):
        # 可微二值化实现,通过概率图和阈值图计算文本分割二值图
        return paddle.reciprocal(1 + paddle.exp(-self.k * (x - y)))

    def forward(self, x, targets=None):
        shrink_maps = self.binarize(x)
        if not self.training:
            return {'maps': shrink_maps}

        threshold_maps = self.thresh(x)
        binary_maps = self.step_function(shrink_maps, threshold_maps)
        y = paddle.concat([shrink_maps, threshold_maps, binary_maps], axis=1)
        return {'maps': y}
# 1. 从PaddleOCR中imort DBHead
from ppocr.modeling.heads.det_db_head import DBHead
import paddle 

# 2. 计算DBFPN网络输出结果
fake_inputs = paddle.randn([1, 3, 640, 640], dtype="float32")
model_backbone = MobileNetV3()
in_channles = model_backbone.out_channels
model_fpn = DBFPN(in_channels=in_channles, out_channels=256)
outs = model_backbone(fake_inputs)
fpn_outs = model_fpn(outs)

# 3. 声明Head网络
model_db_head = DBHead(in_channels=256)

# 4. 打印DBhead网络
print(model_db_head)

# 5. 计算Head网络的输出
db_head_outs = model_db_head(fpn_outs)
print(f"The shape of fpn outs {fpn_outs.shape}")
print(f"The shape of DB head outs {db_head_outs['maps'].shape}")

在这里插入图片描述

运行后发现报错:
类不完整,于是重新到github paddle ocr目录下下载相应文件
db_fpn.py
det_db_head.py

完整代码:

# from paddle import nn
# 
# import paddle
# from paddle import nn
# import paddle.nn.functional as F
# from paddle import ParamAttr
# 
# import math
# import paddle
# from paddle import nn
# import paddle.nn.functional as F
# from paddle import ParamAttr
# 
# # import paddle
# # from ppocr.modeling.backbones.det_mobilenet_v3 import MobileNetV3

import math
import paddle
from paddle import nn
import paddle.nn.functional as F
from paddle import ParamAttr



def make_divisible(v, divisor=8, min_value=None):
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v


class MobileNetV3(nn.Layer):
    def __init__(self,
                 in_channels=3,
                 model_name='large',
                 scale=0.5,
                 disable_se=False,
                 **kwargs):
        """
        the MobilenetV3 backbone network for detection module.
        Args:
            params(dict): the super parameters for build network
        """
        super(MobileNetV3, self).__init__()

        self.disable_se = disable_se

        if model_name == "large":
            cfg = [
                # k, exp, c,  se,     nl,  s,
                [3, 16, 16, False, 'relu', 1],
                [3, 64, 24, False, 'relu', 2],
                [3, 72, 24, False, 'relu', 1],
                [5, 72, 40, True, 'relu', 2],
                [5, 120, 40, True, 'relu', 1],
                [5, 120, 40, True, 'relu', 1],
                [3, 240, 80, False, 'hardswish', 2],
                [3, 200, 80, False, 'hardswish', 1],
                [3, 184, 80, False, 'hardswish', 1],
                [3, 184, 80, False, 'hardswish', 1],
                [3, 480, 112, True, 'hardswish', 1],
                [3, 672, 112, True, 'hardswish', 1],
                [5, 672, 160, True, 'hardswish', 2],
                [5, 960, 160, True, 'hardswish', 1],
                [5, 960, 160, True, 'hardswish', 1],
            ]
            cls_ch_squeeze = 960
        elif model_name == "small":
            cfg = [
                # k, exp, c,  se,     nl,  s,
                [3, 16, 16, True, 'relu', 2],
                [3, 72, 24, False, 'relu', 2],
                [3, 88, 24, False, 'relu', 1],
                [5, 96, 40, True, 'hardswish', 2],
                [5, 240, 40, True, 'hardswish', 1],
                [5, 240, 40, True, 'hardswish', 1],
                [5, 120, 48, True, 'hardswish', 1],
                [5, 144, 48, True, 'hardswish', 1],
                [5, 288, 96, True, 'hardswish', 2],
                [5, 576, 96, True, 'hardswish', 1],
                [5, 576, 96, True, 'hardswish', 1],
            ]
            cls_ch_squeeze = 576
        else:
            raise NotImplementedError("mode[" + model_name +
                                      "_model] is not implemented!")

        supported_scale = [0.35, 0.5, 0.75, 1.0, 1.25]
        assert scale in supported_scale, \
            "supported scale are {} but input scale is {}".format(supported_scale, scale)
        inplanes = 16
        # conv1
        self.conv = ConvBNLayer(
            in_channels=in_channels,
            out_channels=make_divisible(inplanes * scale),
            kernel_size=3,
            stride=2,
            padding=1,
            groups=1,
            if_act=True,
            act='hardswish')

        self.stages = []
        self.out_channels = []
        block_list = []
        i = 0
        inplanes = make_divisible(inplanes * scale)
        for (k, exp, c, se, nl, s) in cfg:
            se = se and not self.disable_se
            start_idx = 2 if model_name == 'large' else 0
            if s == 2 and i > start_idx:
                self.out_channels.append(inplanes)
                self.stages.append(nn.Sequential(*block_list))
                block_list = []
            block_list.append(
                ResidualUnit(
                    in_channels=inplanes,
                    mid_channels=make_divisible(scale * exp),
                    out_channels=make_divisible(scale * c),
                    kernel_size=k,
                    stride=s,
                    use_se=se,
                    act=nl))
            inplanes = make_divisible(scale * c)
            i += 1
        block_list.append(
            ConvBNLayer(
                in_channels=inplanes,
                out_channels=make_divisible(scale * cls_ch_squeeze),
                kernel_size=1,
                stride=1,
                padding=0,
                groups=1,
                if_act=True,
                act='hardswish'))
        self.stages.append(nn.Sequential(*block_list))
        self.out_channels.append(make_divisible(scale * cls_ch_squeeze))
        for i, stage in enumerate(self.stages):
            self.add_sublayer(sublayer=stage, name="stage{}".format(i))

    def forward(self, x):
        x = self.conv(x)
        out_list = []
        for stage in self.stages:
            x = stage(x)
            out_list.append(x)
        return out_list


class ConvBNLayer(nn.Layer):
    def __init__(self,
                 in_channels,
                 out_channels,
                 kernel_size,
                 stride,
                 padding,
                 groups=1,
                 if_act=True,
                 act=None):
        super(ConvBNLayer, self).__init__()
        self.if_act = if_act
        self.act = act
        self.conv = nn.Conv2D(
            in_channels=in_channels,
            out_channels=out_channels,
            kernel_size=kernel_size,
            stride=stride,
            padding=padding,
            groups=groups,
            bias_attr=False)

        self.bn = nn.BatchNorm(num_channels=out_channels, act=None)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        if self.if_act:
            if self.act == "relu":
                x = F.relu(x)
            elif self.act == "hardswish":
                x = F.hardswish(x)
            else:
                print("The activation function({}) is selected incorrectly.".
                      format(self.act))
                exit()
        return x


class ResidualUnit(nn.Layer):
    def __init__(self,
                 in_channels,
                 mid_channels,
                 out_channels,
                 kernel_size,
                 stride,
                 use_se,
                 act=None):
        super(ResidualUnit, self).__init__()
        self.if_shortcut = stride == 1 and in_channels == out_channels
        self.if_se = use_se

        self.expand_conv = ConvBNLayer(
            in_channels=in_channels,
            out_channels=mid_channels,
            kernel_size=1,
            stride=1,
            padding=0,
            if_act=True,
            act=act)
        self.bottleneck_conv = ConvBNLayer(
            in_channels=mid_channels,
            out_channels=mid_channels,
            kernel_size=kernel_size,
            stride=stride,
            padding=int((kernel_size - 1) // 2),
            groups=mid_channels,
            if_act=True,
            act=act)
        if self.if_se:
            self.mid_se = SEModule(mid_channels)
        self.linear_conv = ConvBNLayer(
            in_channels=mid_channels,
            out_channels=out_channels,
            kernel_size=1,
            stride=1,
            padding=0,
            if_act=False,
            act=None)

    def forward(self, inputs):
        x = self.expand_conv(inputs)
        x = self.bottleneck_conv(x)
        if self.if_se:
            x = self.mid_se(x)
        x = self.linear_conv(x)
        if self.if_shortcut:
            x = paddle.add(inputs, x)
        return x


class SEModule(nn.Layer):
    def __init__(self, in_channels, reduction=4):
        super(SEModule, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2D(1)
        self.conv1 = nn.Conv2D(
            in_channels=in_channels,
            out_channels=in_channels // reduction,
            kernel_size=1,
            stride=1,
            padding=0)
        self.conv2 = nn.Conv2D(
            in_channels=in_channels // reduction,
            out_channels=in_channels,
            kernel_size=1,
            stride=1,
            padding=0)

    def forward(self, inputs):
        outputs = self.avg_pool(inputs)
        outputs = self.conv1(outputs)
        outputs = F.relu(outputs)
        outputs = self.conv2(outputs)
        outputs = F.hardsigmoid(outputs, slope=0.2, offset=0.5)
        return inputs * outputs


class DBFPN(nn.Layer):
    def __init__(self, in_channels, out_channels, **kwargs):
        super(DBFPN, self).__init__()
        self.out_channels = out_channels
        weight_attr = paddle.nn.initializer.KaimingUniform()

        self.in2_conv = nn.Conv2D(
            in_channels=in_channels[0],
            out_channels=self.out_channels,
            kernel_size=1,
            weight_attr=ParamAttr(initializer=weight_attr),
            bias_attr=False)
        self.in3_conv = nn.Conv2D(
            in_channels=in_channels[1],
            out_channels=self.out_channels,
            kernel_size=1,
            weight_attr=ParamAttr(initializer=weight_attr),
            bias_attr=False)
        self.in4_conv = nn.Conv2D(
            in_channels=in_channels[2],
            out_channels=self.out_channels,
            kernel_size=1,
            weight_attr=ParamAttr(initializer=weight_attr),
            bias_attr=False)
        self.in5_conv = nn.Conv2D(
            in_channels=in_channels[3],
            out_channels=self.out_channels,
            kernel_size=1,
            weight_attr=ParamAttr(initializer=weight_attr),
            bias_attr=False)
        self.p5_conv = nn.Conv2D(
            in_channels=self.out_channels,
            out_channels=self.out_channels // 4,
            kernel_size=3,
            padding=1,
            weight_attr=ParamAttr(initializer=weight_attr),
            bias_attr=False)
        self.p4_conv = nn.Conv2D(
            in_channels=self.out_channels,
            out_channels=self.out_channels // 4,
            kernel_size=3,
            padding=1,
            weight_attr=ParamAttr(initializer=weight_attr),
            bias_attr=False)
        self.p3_conv = nn.Conv2D(
            in_channels=self.out_channels,
            out_channels=self.out_channels // 4,
            kernel_size=3,
            padding=1,
            weight_attr=ParamAttr(initializer=weight_attr),
            bias_attr=False)
        self.p2_conv = nn.Conv2D(
            in_channels=self.out_channels,
            out_channels=self.out_channels // 4,
            kernel_size=3,
            padding=1,
            weight_attr=ParamAttr(initializer=weight_attr),
            bias_attr=False)

    def forward(self, x):
        c2, c3, c4, c5 = x

        in5 = self.in5_conv(c5)
        in4 = self.in4_conv(c4)
        in3 = self.in3_conv(c3)
        in2 = self.in2_conv(c2)

        out4 = in4 + F.upsample(
            in5, scale_factor=2, mode="nearest", align_mode=1)  # 1/16
        out3 = in3 + F.upsample(
            out4, scale_factor=2, mode="nearest", align_mode=1)  # 1/8
        out2 = in2 + F.upsample(
            out3, scale_factor=2, mode="nearest", align_mode=1)  # 1/4

        p5 = self.p5_conv(in5)
        p4 = self.p4_conv(out4)
        p3 = self.p3_conv(out3)
        p2 = self.p2_conv(out2)
        p5 = F.upsample(p5, scale_factor=8, mode="nearest", align_mode=1)
        p4 = F.upsample(p4, scale_factor=4, mode="nearest", align_mode=1)
        p3 = F.upsample(p3, scale_factor=2, mode="nearest", align_mode=1)

        fuse = paddle.concat([p5, p4, p3, p2], axis=1)
        return fuse
# class DBFPN(nn.Layer):
#     def __init__(self, in_channels, out_channels, **kwargs):
#         super(DBFPN, self).__init__()
#         self.out_channels = out_channels
#
#         # DBFPN详细实现参考: https://github.com/PaddlePaddle/PaddleOCRblob/release%2F2.4/ppocr/modeling/necks/db_fpn.py
#
#     def forward(self, x):
#         c2, c3, c4, c5 = x
#
#         in5 = self.in5_conv(c5)
#         in4 = self.in4_conv(c4)
#         in3 = self.in3_conv(c3)
#         in2 = self.in2_conv(c2)
#
#         # 特征上采样
#         out4 = in4 + F.upsample(
#             in5, scale_factor=2, mode="nearest", align_mode=1)  # 1/16
#         out3 = in3 + F.upsample(
#             out4, scale_factor=2, mode="nearest", align_mode=1)  # 1/8
#         out2 = in2 + F.upsample(
#             out3, scale_factor=2, mode="nearest", align_mode=1)  # 1/4
#
#         p5 = self.p5_conv(in5)
#         p4 = self.p4_conv(out4)
#         p3 = self.p3_conv(out3)
#         p2 = self.p2_conv(out2)
#
#         # 特征上采样
#         p5 = F.upsample(p5, scale_factor=8, mode="nearest", align_mode=1)
#         p4 = F.upsample(p4, scale_factor=4, mode="nearest", align_mode=1)
#         p3 = F.upsample(p3, scale_factor=2, mode="nearest", align_mode=1)
#
#         fuse = paddle.concat([p5, p4, p3, p2], axis=1)
#         return fuse




def get_bias_attr(k):
    stdv = 1.0 / math.sqrt(k * 1.0)
    initializer = paddle.nn.initializer.Uniform(-stdv, stdv)
    bias_attr = ParamAttr(initializer=initializer)
    return bias_attr


class Head(nn.Layer):
    def __init__(self, in_channels, name_list):
        super(Head, self).__init__()
        self.conv1 = nn.Conv2D(
            in_channels=in_channels,
            out_channels=in_channels // 4,
            kernel_size=3,
            padding=1,
            weight_attr=ParamAttr(),
            bias_attr=False)
        self.conv_bn1 = nn.BatchNorm(
            num_channels=in_channels // 4,
            param_attr=ParamAttr(
                initializer=paddle.nn.initializer.Constant(value=1.0)),
            bias_attr=ParamAttr(
                initializer=paddle.nn.initializer.Constant(value=1e-4)),
            act='relu')
        self.conv2 = nn.Conv2DTranspose(
            in_channels=in_channels // 4,
            out_channels=in_channels // 4,
            kernel_size=2,
            stride=2,
            weight_attr=ParamAttr(
                initializer=paddle.nn.initializer.KaimingUniform()),
            bias_attr=get_bias_attr(in_channels // 4))
        self.conv_bn2 = nn.BatchNorm(
            num_channels=in_channels // 4,
            param_attr=ParamAttr(
                initializer=paddle.nn.initializer.Constant(value=1.0)),
            bias_attr=ParamAttr(
                initializer=paddle.nn.initializer.Constant(value=1e-4)),
            act="relu")
        self.conv3 = nn.Conv2DTranspose(
            in_channels=in_channels // 4,
            out_channels=1,
            kernel_size=2,
            stride=2,
            weight_attr=ParamAttr(
                initializer=paddle.nn.initializer.KaimingUniform()),
            bias_attr=get_bias_attr(in_channels // 4), )

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv_bn1(x)
        x = self.conv2(x)
        x = self.conv_bn2(x)
        x = self.conv3(x)
        x = F.sigmoid(x)
        return x


class DBHead(nn.Layer):
    """
    Differentiable Binarization (DB) for text detection:
        see https://arxiv.org/abs/1911.08947
    args:
        params(dict): super parameters for build DB network
    """

    def __init__(self, in_channels, k=50, **kwargs):
        super(DBHead, self).__init__()
        self.k = k
        binarize_name_list = [
            'conv2d_56', 'batch_norm_47', 'conv2d_transpose_0', 'batch_norm_48',
            'conv2d_transpose_1', 'binarize'
        ]
        thresh_name_list = [
            'conv2d_57', 'batch_norm_49', 'conv2d_transpose_2', 'batch_norm_50',
            'conv2d_transpose_3', 'thresh'
        ]
        self.binarize = Head(in_channels, binarize_name_list)
        self.thresh = Head(in_channels, thresh_name_list)

    def step_function(self, x, y):
        return paddle.reciprocal(1 + paddle.exp(-self.k * (x - y)))

    def forward(self, x, targets=None):
        shrink_maps = self.binarize(x)
        if not self.training:
            return {'maps': shrink_maps}

        threshold_maps = self.thresh(x)
        binary_maps = self.step_function(shrink_maps, threshold_maps)
        y = paddle.concat([shrink_maps, threshold_maps, binary_maps], axis=1)
        return {'maps': y}
# class DBHead(nn.Layer):
#     """
#     Differentiable Binarization (DB) for text detection:
#         see https://arxiv.org/abs/1911.08947
#     args:
#         params(dict): super parameters for build DB network
#     """
#
#     def __init__(self, in_channels, k=50, **kwargs):
#         super(DBHead, self).__init__()
#         self.k = k
#
#         # DBHead详细实现参考 https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.4/ppocr/modeling/heads/det_db_head.py
#
#     def step_function(self, x, y):
#         # 可微二值化实现,通过概率图和阈值图计算文本分割二值图
#         return paddle.reciprocal(1 + paddle.exp(-self.k * (x - y)))
#
#     def forward(self, x, targets=None):
#         shrink_maps = self.binarize(x)
#         if not self.training:
#             return {'maps': shrink_maps}
#
#         threshold_maps = self.thresh(x)
#         binary_maps = self.step_function(shrink_maps, threshold_maps)
#         y = paddle.concat([shrink_maps, threshold_maps, binary_maps], axis=1)
#         return {'maps': y}



if __name__=='__main__':


    fake_inputs = paddle.randn([1, 3, 640, 640], dtype="float32")

    #   声明Backbone
    model_backbone = MobileNetV3()
    # model_backbone.eval()

    # # 2. 执行预测
    # outs = model_backbone(fake_inputs)

    # # 3. 打印网络结构
    # # print(model_backbone)
    #
    # # 4. 打印输出特征形状
    # for idx, out in enumerate(outs):
    #     print("The index is ", idx, "and the shape of output is ", out.shape)
    # The index is  0 and the shape of output is  [1, 16, 160, 160]
    # The index is  1 and the shape of output is  [1, 24, 80, 80]
    # The index is  2 and the shape of output is  [1, 56, 40, 40]
    # The index is  3 and the shape of output is  [1, 480, 20, 20]
    in_channles = model_backbone.out_channels

    # 声明FPN网络
    model_fpn = DBFPN(in_channels=in_channles, out_channels=256)

    #  打印FPN网络
    print(model_fpn)
    # DBFPN(
    #   (in2_conv): Conv2D(16, 256, kernel_size=[1, 1], data_format=NCHW)
    #   (in3_conv): Conv2D(24, 256, kernel_size=[1, 1], data_format=NCHW)
    #   (in4_conv): Conv2D(56, 256, kernel_size=[1, 1], data_format=NCHW)
    #   (in5_conv): Conv2D(480, 256, kernel_size=[1, 1], data_format=NCHW)
    #   (p5_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
    #   (p4_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
    #   (p3_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
    #   (p2_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
    # )
    # 5. 计算得到FPN结果输出
    outs = model_backbone(fake_inputs)
    fpn_outs = model_fpn(outs)
    # The shape of fpn outs [1, 256, 160, 160]

    # 3. 声明Head网络
    model_db_head = DBHead(in_channels=256)

    # 4. 打印DBhead网络
    print(model_db_head)
    # DBHead(
    #   (binarize): Head(
    #     (conv1): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
    #     (conv_bn1): BatchNorm()
    #     (conv2): Conv2DTranspose(64, 64, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)
    #     (conv_bn2): BatchNorm()
    #     (conv3): Conv2DTranspose(64, 1, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)
    #   )
    #   (thresh): Head(
    #     (conv1): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
    #     (conv_bn1): BatchNorm()
    #     (conv2): Conv2DTranspose(64, 64, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)
    #     (conv_bn2): BatchNorm()
    #     (conv3): Conv2DTranspose(64, 1, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)
    #   )
    # )
    # 5. 计算Head网络的输出
    db_head_outs = model_db_head(fpn_outs)
    print(f"The shape of fpn outs {fpn_outs.shape}")
    # The shape of fpn outs [1, 256, 160, 160]
    print(f"The shape of DB head outs {db_head_outs['maps'].shape}")
    # The shape of DB head outs [1, 3, 640, 640]

结果:

DBFPN(
  (in2_conv): Conv2D(16, 256, kernel_size=[1, 1], data_format=NCHW)
  (in3_conv): Conv2D(24, 256, kernel_size=[1, 1], data_format=NCHW)
  (in4_conv): Conv2D(56, 256, kernel_size=[1, 1], data_format=NCHW)
  (in5_conv): Conv2D(480, 256, kernel_size=[1, 1], data_format=NCHW)
  (p5_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
  (p4_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
  (p3_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
  (p2_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
)
DBHead(
  (binarize): Head(
    (conv1): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
    (conv_bn1): BatchNorm()
    (conv2): Conv2DTranspose(64, 64, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)
    (conv_bn2): BatchNorm()
    (conv3): Conv2DTranspose(64, 1, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)
  )
  (thresh): Head(
    (conv1): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
    (conv_bn1): BatchNorm()
    (conv2): Conv2DTranspose(64, 64, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)
    (conv_bn2): BatchNorm()
    (conv3): Conv2DTranspose(64, 1, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)
  )
)
The shape of fpn outs [1, 256, 160, 160]
The shape of DB head outs [1, 3, 640, 640]

DB算法优点:(有监督,backbone选ResNet50效果更好)

  • 精度更高、快
  • 弯曲文本
  • 多方向文本
  • 多语言

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1516799.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

IDEA自定义Maven仓库

Maven 是一款广泛应用于 Java 开发的工具&#xff0c;其作用类似于一个全自动的 JAR 包管理器&#xff0c;能够方便地导入开发所需的相关 JAR 包。在使用 Maven 进行 Java 程序开发时&#xff0c;开发者能够极大地提高开发效率。以下是关于如何安装 Maven 以及在 IDEA 中配置自…

电脑远程桌面选项变成灰色没办法勾选怎么办?

有些人在使用Windows系统自带的远程桌面工具时&#xff0c;会发现系统属性远程桌面选项卡中勾选启用“允许远程连接到此计算机”。 导致此问题出现的原因主要是由于组策略或者注册表设置错误造成的。 修复远程桌面选项变灰的两种方法&#xff01; 方法一&#xff1a;设置本地组…

【LeetCode】84. 柱状图中最大的矩形(困难)——代码随想录算法训练营Day60

题目链接&#xff1a;84. 柱状图中最大的矩形 题目描述 给定 n 个非负整数&#xff0c;用来表示柱状图中各个柱子的高度。每个柱子彼此相邻&#xff0c;且宽度为 1 。 求在该柱状图中&#xff0c;能够勾勒出来的矩形的最大面积。 示例 1: 输入&#xff1a;heights [2,1,5,…

Vue3调用钉钉api,内嵌H5微应用单点登录对接

钉钉内嵌H5微应用单点登录对接 https://open.dingtalk.com/document/isvapp/obtain-the-userid-of-a-user-by-using-the-log-free 前端需要的代码 1、安装 dingtalk-jsapi npm install dingtalk-jsapi2、在所需页面引入 import * as dd from dingtalk-jsapi; // 引入钉钉a…

python版本原因导致的grpcio-tools-1.48.2安装失败

因为工作需要使用python开发grpc客户端&#xff0c;在mac电脑上通以下命令安装python的grpc依赖库总是不成功 pip3 install --no-cache-dir --force-reinstall -Iv grpcio1.48.2 grpcio-tools1.48.2 clang -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG …

【string一些函数用法的补充】

提示&#xff1a;文章写完后&#xff0c;目录可以自动生成&#xff0c;如何生成可参考右边的帮助文档 文章目录 前言 string类对象的修改操作 我们来看 c_str 返回c格式的字符串的操作&#xff1a; 我们来看 rfind 和 substr 的操作&#xff1a; string类非成员函数 我们来看 r…

安卓之四大组件

组件描述Activity(活动)在应用中的一个Activity可以用来表示一个界面&#xff0c;意思可以理解为“活动”&#xff0c;即一个活动开始&#xff0c;代表 Activity组件启动&#xff0c;活动结束&#xff0c;代表一个Activity的生命周期结束。一个Android应用必须通过Activity来运…

react04- mvc 、 mvvm

MVC与MVVM stackoverflow论坛网站 react前端框架 使用框架前&#xff1a; 操作dom > js获取dom元素&#xff0c;事件侦听&#xff0c;修改数据&#xff0c;设置样式。。。 操作dom问题: 直接操作dom&#xff0c;会造成大量的回流、重绘&#xff0c;消耗大量性能操作起来也…

MySQL基础---SQL语句2(WHERE、AND、OR、ORDER BY、COUNT)

1. WHERE 子句 1. 语法 WHERE 子句用于限定选择的标准 在 slelece、update、delete 语句中&#xff0c;皆可使用 WHERE 子句来限定选择的标准 -- 查询语句 select 列名称 form 表名称 where 列 运算符 值-- 更新语句 update 列名称 form 列新值 where 列 运算符 值-- 删除语句…

Linux——信号量

目录 POSIX信号量 信号量原理 信号量概念 信号量函数 基于环形队列的生产者消费者模型 生产者和消费者申请和释放资源 单生产者单消费者 多生产者多消费者 多生产者多消费者的意义 信号量的意义 POSIX信号量 信号量原理 如果仅用一个互斥锁对临界资源进行保护&#…

Vulnhub靶机:Kioptrix_Level1.2

一、介绍 运行环境&#xff1a;Virtualbox 攻击机&#xff1a;kali&#xff08;192.168.56.101&#xff09; 靶机&#xff1a;Kioptrix_Level1.2&#xff08;192.168.56.106&#xff09; 目标&#xff1a;获取靶机root权限和flag 靶机下载地址&#xff1a;https://www.vul…

【3月14日-云服务器推荐】阿里云 腾讯云 京东云有什么区别? 选购指南 最新价格对比 搭建博客 游戏服务器均可多用

3月14日更新&#xff0c;本文纯原创&#xff0c;侵权必究 《最新对比表》已更新在文章头部—腾讯云文档&#xff0c;文章具有时效性&#xff0c;请以腾讯文档为准&#xff01; 【腾讯文档实时更新】云服务器1分钟教会你如何选择教程 https://docs.qq.com/document/DV0RCS0lGeH…

郭炜老师mooc第十一章数据分析和展示(numpy,pandas, matplotlib)

多维数组库numpy numpy创建数组的常用函数 # numpy数组import numpy as np #以后numpy简写为np print(np.array([1,2,3])) #>>[1 2 3] print(np.arange(1,9,2)) #>>[1 3 5 7] 不包括9 print(np.linspace(1,10,4)) #>>[ 1. 4. 7. 10.] # linespace(x,y,n)&…

基于YOLOv8/YOLOv7/YOLOv6/YOLOv5的日常场景下的人脸检测系统(深度学习模型+PySide6界面+训练数据集+Python代码)

摘要&#xff1a;开发用于日常环境中的人脸识别系统对增强安全监测和提供定制化服务极为关键。本篇文章详细描述了运用深度学习技术开发人脸识别系统的全过程&#xff0c;并附上了完整的代码。该系统搭建在强大的YOLOv8算法之上&#xff0c;并通过与YOLOv7、YOLOv6、YOLOv5的性…

YOLOv9如何训练自己的数据集(NEU-DET为案列)

&#x1f4a1;&#x1f4a1;&#x1f4a1;本文内容&#xff1a;教会你用自己数据集训练YOLOv9模型 YOLOv9魔术师专栏 ☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️ ☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️ 包含注意力机制魔…

没有硬件基础可以学单片机吗?

没有硬件基础可以学单片机吗&#xff1f; 在开始前我分享下我的经历&#xff0c;我刚入行时遇到一个好公司和师父&#xff0c;给了我机会&#xff0c;一年时间从3k薪资涨到18k的&#xff0c; 我师父给了一些 电气工程师学习方法和资料&#xff0c;让我不断提升自己&#xff0c…

Docker-基本命令

目录 一、Docker与虚拟机技术 二、Docker功能 三、安装 安装&#xff1a; 1、环境准备&#xff1a; 2、安装docker 3、配置阿里云镜像加速 镜像加速源 4、Docker是怎么工作的 5、Docker为什么比虚拟机快 四、docker命令 1、镜像命令 Docker官方镜像库&#xff1a…

深入理解与应用Keepalive机制

目录 引言 一、VRRP协议 &#xff08;一&#xff09;VRRP概述 1.诞生背景 2.基本理论 &#xff08;二&#xff09;VRRP工作原理 &#xff08;三&#xff09;VRRP相关术语 二、keepalive基本理论 &#xff08;一&#xff09;基本性能 &#xff08;二&#xff09;实现原…

Hadoop伪分布式配置--没有DataNode或NameNode

一、原因分析 重复格式化NameNode 二、解决方法 1、输入格式化NameNode命令&#xff0c;找到data和name存放位置 ./bin/hdfs namenode -format 2、删除data或name&#xff08;没有哪个删哪个&#xff09; sudo rm -rf data 3、重新格式化NameNode 4、重新启动即可。

Seata 2.x 系列【7】服务端集成 Nacos 2.x

有道无术&#xff0c;术尚可求&#xff0c;有术无道&#xff0c;止于术。 本系列Seata 版本 2.0.0 本系列Spring Boot 版本 3.2.0 本系列Spring Cloud 版本 2023.0.0 源码地址&#xff1a;https://gitee.com/pearl-organization/study-seata-demo 文章目录 1. 概述2. 安装 N…