基于轻量级神经网络GhostNet开发构建的200种鸟类细粒度识别分析系统

最近项目需要用到轻量级的网络模型，后期考虑进一步的剪枝和量化达到加速推理的目的，正好有时间就想着基于实际的数据集来开发构建项目做测试，本文的核心目的就是选定轻量级神经网络模型GhostNet来开发构建细粒度鸟类识别系统，首先看下效果图：

简单看下数据集：

数据集随机划分相关的实现可以看我前文的介绍，这里直接给出核心代码实现：

# 加载解析创建数据集
if not os.path.exists("dataset.json"):
    train_dataset = []
    test_dataset = []
    all_dataset = []
    classes_list = os.listdir(datasetDir)
    classes_list.sort()
    print("classes_list: ", classes_list)
    with open("weights/classes.txt","w") as f:
        for one_label in classes_list:
            f.write(one_label.strip()+"\n")
    print("classes file write success!")
    num_classes=len(classes_list)
    for one_label in os.listdir(datasetDir):
        oneDir = datasetDir + one_label + "/"
        for one_pic in os.listdir(oneDir):
            one_path = oneDir + one_pic
            try:
                one_ind = classes_list.index(one_label)
                all_dataset.append([one_ind, one_path])
            except:
                pass
    train_ratio = 0.90
    train_num = int(train_ratio * len(all_dataset))
    all_inds = list(range(len(all_dataset)))
    train_inds = random.sample(all_inds, train_num)
    test_inds = [one for one in all_inds if one not in train_inds]
    for one_ind in train_inds:
        train_dataset.append(all_dataset[one_ind])
    for one_ind in test_inds:
        test_dataset.append(all_dataset[one_ind])
    dataset = {}
    dataset["train"] = train_dataset
    dataset["test"] = test_dataset
    with open("dataset.json", "w") as f:
        f.write(json.dumps(dataset))
else:
    with open("dataset.json") as f:
        dataset = json.load(f)
    train_dataset = dataset["train"]
    test_dataset = dataset["test"]
    with open("weights/classes.txt","r") as f:
        classes_list=[one.strip() for one in f.readlines() if one.strip()]
    print("classes_list: ", classes_list)
    num_classes = len(classes_list)
print("train_dataset_size: ", len(train_dataset))
print("test_dataset_size: ", len(test_dataset))

以下是一些常用的轻量级卷积神经网络模型：

MobileNet：MobileNet是一种基于深度可分离卷积的轻量级模型，通过Depthwise Separable Convolution减少参数量和计算量，适用于移动设备上的图像分类和目标检测。
ShuffleNet：ShuffleNet通过使用通道洗牌操作来减少参数量和计算量。它采用逐点卷积和组卷积，将通道分为组并进行特征图的混洗，以增加特征的多样性。
EfficientNet：EfficientNet是一系列模型，通过使用复合缩放方法在深度、宽度和分辨率上进行均衡扩展。它在减少参数和计算量的同时，保持高准确性。
MobileNetV3：MobileNetV3是MobileNet的改进版本，引入了候选网络和网络搜索方法，通过优化模型结构和激活函数，进一步提升了轻量级模型的性能。
ProxylessNAS：ProxylessNAS是使用神经网络搜索算法来自动搜索轻量级模型结构的方法。它通过替代器生成网络中的每个操作，以有效地搜索高效的模型结构。
SqueezeNet：SqueezeNet是一种极小化的卷积神经网络模型，使用Fire模块将降维卷积和扩展卷积组合在一起，以减少参数量和计算量。

这些轻量级模型在参数量和计算量上相对较少，适用于资源受限的设备或场景。然而，每个模型都有不同的性能和特点，根据应用需求和资源限制，选择合适的模型进行使用。同时，还可以根据具体任务的要求进行模型的调整和优化。

GhostNet是一种轻量级的卷积神经网络模型，旨在在计算资源有限的设备上实现高效的图像分类和目标检测。其主要原理是通过使用Ghost Module来减少参数量和计算量，并提高模型在资源受限条件下的性能。

Ghost Module是GhostNet的关键组成部分，其主要思想是通过将一个普通的卷积层分解为两个部分：主要卷积（或称为Ghost指示器）和辅助卷积。具体构建原理如下：

主要卷积（Ghost指示器）：该部分包含少量的输出通道数（称为精简通道），可以看作是对原始卷积的一种降维表示。它对输入进行低维特征提取，并通过学习有效的过滤器来减少参数量和计算量。
辅助卷积：该部分包含更多的输出通道数（称为扩展通道），用于捕捉更丰富的特征表达。这种设计有助于模型在较少的参数量下保持较高的表示能力，提高对复杂图像的判别能力。

GhostNet模型的优点如下：

轻量高效：GhostNet通过使用Ghost Module，减少了模型的参数量和计算量，使得它在计算资源受限的设备上运行速度更快，能够满足更多应用的需求。
参数效率：Ghost Module通过以较少的参数产生较多的特征图，提高了参数的利用效率。这使得模型更具可扩展性，并能够更好地适应低功耗的设备和移动端应用需求。
准确性保持：尽管GhostNet是为了追求高效而设计的，但经过实证研究表明，在一些图像分类和目标检测任务中，它的准确性能够与一些常用的大型模型相媲美，或接近。

GhostNet模型的缺点如下：

空间复杂性：尽管GhostNet在参数和计算量上显著减少，但由于采用了辅助卷积来提取更丰富的特征，其空间复杂性相对较高。这可能使得在计算资源极度有限的设备上推理速度较慢。
特定任务的局限性：GhostNet主要用于图像分类和目标检测任务。对于其他类型的任务，如语义分割或实例分割等，GhostNet可能需要额外的定制和改进来适应任务的需求。

总之，GhostNet作为一种轻量级的模型设计，通过Ghost Module降低了模型的参数量和计算量，提高了在计算资源有限的设备上的性能。尽管存在一些局限性，但它在保持一定准确性的同时，能够在资源受限情况下提供高效的图像分类和目标检测能力。

在以往的项目中使用Mobilenet模型居多，较少使用GhostNet，所以这里以实地项目开发的方式也是想进一步熟悉GhostNet模型，这里模型搭建实现代码如下所示：

# encoding:utf-8
from __future__ import division


"""
__Author__:沂水寒城
功能：  GhostNet
"""


import torch
import torch.nn as nn
import math
import numpy as np
from torch.hub import load_state_dict_from_url
from utils.utils import load_weights_from_state_dict



def _make_divisible(v, divisor, min_value=None):
    """
    参考
    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    """
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v


class SELayer(nn.Module):
    """
    SE Layer
    """
    def __init__(self, channel, reduction=4):
        super(SELayer, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(channel, channel // reduction),
            nn.ReLU(inplace=True),
            nn.Linear(channel // reduction, channel),
        )

    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c)
        y = self.fc(y).view(b, c, 1, 1)
        y = torch.clamp(y, 0, 1)
        return x * y


def depthwise_conv(inp, oup, kernel_size=3, stride=1, relu=False):
    """
    DW
    """
    return nn.Sequential(
        nn.Conv2d(
            inp, oup, kernel_size, stride, kernel_size // 2, groups=inp, bias=False
        ),
        nn.BatchNorm2d(oup),
        nn.ReLU(inplace=True) if relu else nn.Sequential(),
    )


class GhostModule(nn.Module):
    """
    Ghost
    """
    def __init__(
        self, inp, oup, kernel_size=1, ratio=2, dw_size=3, stride=1, relu=True
    ):
        super(GhostModule, self).__init__()
        self.oup = oup
        init_channels = math.ceil(oup / ratio)
        new_channels = init_channels * (ratio - 1)

        self.primary_conv = nn.Sequential(
            nn.Conv2d(
                inp, init_channels, kernel_size, stride, kernel_size // 2, bias=False
            ),
            nn.BatchNorm2d(init_channels),
            nn.ReLU(inplace=True) if relu else nn.Sequential(),
        )

        self.cheap_operation = nn.Sequential(
            nn.Conv2d(
                init_channels,
                new_channels,
                dw_size,
                1,
                dw_size // 2,
                groups=init_channels,
                bias=False,
            ),
            nn.BatchNorm2d(new_channels),
            nn.ReLU(inplace=True) if relu else nn.Sequential(),
        )

    def forward(self, x):
        x1 = self.primary_conv(x)
        x2 = self.cheap_operation(x1)
        out = torch.cat([x1, x2], dim=1)
        return out[:, : self.oup, :, :]


class GhostBottleneck(nn.Module):
    """
    GhostBottleneck
    """
    def __init__(self, inp, hidden_dim, oup, kernel_size, stride, use_se):
        super(GhostBottleneck, self).__init__()
        assert stride in [1, 2]
        self.conv = nn.Sequential(
            GhostModule(inp, hidden_dim, kernel_size=1, relu=True),
            depthwise_conv(hidden_dim, hidden_dim, kernel_size, stride, relu=False)
            if stride == 2
            else nn.Sequential(),
            SELayer(hidden_dim) if use_se else nn.Sequential(),
            GhostModule(hidden_dim, oup, kernel_size=1, relu=False),
        )
        if stride == 1 and inp == oup:
            self.shortcut = nn.Sequential()
        else:
            self.shortcut = nn.Sequential(
                depthwise_conv(inp, inp, 3, stride, relu=True),
                nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup),
            )

    def forward(self, x):
        return self.conv(x) + self.shortcut(x)


class GhostNet(nn.Module):
    """
    GhostNet
    """
    def __init__(self, cfgs, num_classes=1000, width_mult=1.0):
        super(GhostNet, self).__init__()
        self.cfgs = cfgs
        output_channel = _make_divisible(16 * width_mult, 4)
        layers = [
            nn.Sequential(
                nn.Conv2d(3, output_channel, 3, 2, 1, bias=False),
                nn.BatchNorm2d(output_channel),
                nn.ReLU(inplace=True),
            )
        ]
        input_channel = output_channel
        block = GhostBottleneck
        for k, exp_size, c, use_se, s in self.cfgs:
            output_channel = _make_divisible(c * width_mult, 4)
            hidden_channel = _make_divisible(exp_size * width_mult, 4)
            layers.append(
                block(input_channel, hidden_channel, output_channel, k, s, use_se)
            )
            input_channel = output_channel
        self.features = nn.Sequential(*layers)
        output_channel = _make_divisible(exp_size * width_mult, 4)
        self.squeeze = nn.Sequential(
            nn.Conv2d(input_channel, output_channel, 1, 1, 0, bias=False),
            nn.BatchNorm2d(output_channel),
            nn.ReLU(inplace=True),
            nn.AdaptiveAvgPool2d((1, 1)),
        )
        input_channel = output_channel
        output_channel = 1280
        self.classifier = nn.Sequential(
            nn.Linear(input_channel, output_channel, bias=False),
            nn.BatchNorm1d(output_channel),
            nn.ReLU(inplace=True),
            nn.Dropout(0.2),
            nn.Linear(output_channel, num_classes),
        )
        self._initialize_weights()

    def forward(self, x, need_fea=False):
        if need_fea:
            features, features_fc = self.forward_features(x, need_fea)
            x = self.classifier(features_fc)
            return features, features_fc, x
        else:
            x = self.forward_features(x)
            x = self.classifier(x)
            return x

    def forward_features(self, x, need_fea=False):
        if need_fea:
            input_size = x.size(2)
            scale = [4, 8, 16, 32]
            features = [None, None, None, None]
            for idx, layer in enumerate(self.features):
                x = layer(x)
                if input_size // x.size(2) in scale:
                    features[scale.index(input_size // x.size(2))] = x
            x = self.squeeze(x)
            return features, x.view(x.size(0), -1)
        else:
            x = self.features(x)
            x = self.squeeze(x)
            return x.view(x.size(0), -1)

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode="fan_out", nonlinearity="relu")
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

    def cam_layer(self):
        return self.features[-1]

可以直接去华为开源仓库里面或者是开源社区其他的项目里面选择自己喜欢的代码实现即可，不需要自己去重头实现，理解应用即可。这里就不再赘述了，很多项目整体已经是比较完善的了。

这里对模型预测结果也进行了簇群可视化，如下所示：