【6s965-fall2022】深度学习的效率指标

news2025/7/14 2:10:55

在这里插入图片描述

两个核心指标是计算和内存(Computation and Memory)。
需要考虑的三个维度是存储、延迟和能耗(Storage, Latency, and Energy)。

延迟 Latency

在这里插入图片描述
Latency = $max(T_{operation}, T_{memory})$

能耗 Energy

在这里插入图片描述

内存访问比计算更消耗能量。以下是能耗排名:
DRAM Access > SRAM Access > FP Mult > INT Mult > Register File > FP Add > INT Add
因此，我们应该避免数据移动，因为数据移动越多，内存引用就会导致更多的能量消耗。

内存相关的指标

模型参数 Number of parameters (#Parameters)

Linear: $c_0 \times c_i$
Convolution: $c_o c_i k_h k_w$
Grouped convolution: $c_o c_i k_h k_w / g$
Depthwise convolution: $c_o k_w k_h$

模型大小 Model size

$\times BitWidth=模型参数 \times 位宽$

例如，AlexNet有61M参数，因此它的模型大小将是244MB (FP32)和61MB (INT8)。

激活函数的个数 Number of Activations(#Activations)

激活函数的个数是IoT推理中的内存瓶颈，而不是模型参数。
在训练过程中，内存瓶颈不是参数，而是激活函数的个数。
- MCUNet：从输入层到输出层，激活占的比例越来越小，权重占的比例越来越大，因为通道在增加。

计算相关的指标

MACs: multiply-accumulate operations 乘法累加操作

一次乘法累加(MAC)操作是 $\times c$ 。
以下是一些常见的MACs的计算方式:
- Matrix-vector multiplication (MV): $m\times n$
- General matrix-matrix multiplication (GEMM): $m\times n\times k$
- Linear layer: $c_o\times c_i$
- Convolution: $c_i\times k_w \times k_h \times h_o \times w_o \times c_o$
- Grouped convolution: $c_i\times k_w \times k_h \times h_o \times w_o \times c_o / g$
- Depthwise convolution: $k_w \times k_h \times h_o \times w_o \times c_o$

FLOP: floating point operation

$1 M A C = 2 F L OP$
- 例如，AlexNet有724M mac，对应1.4G FLOP。
Floating point operation per second (FLOPS)
- $\frac{FLOP}{second}$

用python 实现这些指标

from torchprofile import profile_macs

def get_model_macs(model, inputs) -> int:
    return profile_macs(model, inputs)

def get_num_parameters(model: nn.Module, count_nonzero_only=False) -> int:
    """
    calculate the total number of parameters of model
    :param count_nonzero_only: only count nonzero weights
    """
    num_counted_elements = 0
    for param in model.parameters():
        if count_nonzero_only:
            num_counted_elements += param.count_nonzero()
        else:
            num_counted_elements += param.numel()
    return num_counted_elements


def get_model_size(model: nn.Module, data_width=32, count_nonzero_only=False) -> int:
    """
    calculate the model size in bits
    :param data_width: #bits per element
    :param count_nonzero_only: only count nonzero weights
    """
    return get_num_parameters(model, count_nonzero_only) * data_width


Byte = 8
KiB = 1024 * Byte
MiB = 1024 * KiB
GiB = 1024 * MiB