彩色图像（RGB）或灰度图像（Gray）转tensor数据（附img2tensor代码）

news2025/4/15 9:40:14

在这里插入图片描述

💪 专业从事且热爱图像处理，图像处理专栏更新如下👇：
📝《图像去噪》
📝《超分辨率重建》
📝《语义分割》
📝《风格迁移》
📝《目标检测》
📝《暗光增强》
📝《模型优化》
📝《模型实战部署》

在这里插入图片描述

一、报错：IndexError: tuple index out of range

在处理灰度图像转tensor数据时，我遇到了下面问题：

在这里插入图片描述

1.1 问题分析

报错 IndexError: tuple index out of range 通常是由于试图访问数组不存在的索引引起的。我输入的灰度图像没有第三维度（通道数），导致在访问 img.shape[2] 时出现索引错误。第三维度为通道数，在传入img2tensor时，灰度图像的形状得和彩色图像的形状一致，即高，宽，通道数（h，w，c）。对于灰度图像，通道数为1；对于彩色图像，通道数通常为3（RGB或BGR）。

我使用img2tensor函数调用的是basicsr库里的，下面是basicsr库里原始定义的img2tensor函数代码：

from basicsr.utils import img2tensor, tensor2img

def img2tensor(imgs, bgr2rgb=True, float32=True):
    """Numpy array to tensor.

    Args:
        imgs (list[ndarray] | ndarray): Input images.
        bgr2rgb (bool): Whether to change bgr to rgb.
        float32 (bool): Whether to change to float32.

    Returns:
        list[tensor] | tensor: Tensor images. If returned results only have
            one element, just return tensor.
    """

    def _totensor(img, bgr2rgb, float32):
        if img.shape[2] == 3 and bgr2rgb:
            if img.dtype == 'float64':
                img = img.astype('float32')
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = torch.from_numpy(img.transpose(2, 0, 1))
        if float32:
            img = img.float()
        return img

    if isinstance(imgs, list):
        return [_totensor(img, bgr2rgb, float32) for img in imgs]
    else:
        return _totensor(imgs, bgr2rgb, float32)

从上面代码可以看出，仅针对彩色图像处理，没有考虑当通道灰度图像，下面我进行了改进，可以处理彩色图像也可以灰度图像。

二、三通道或单通道图像转tensor

在上面代码基础上，我新增了代码用于判断输入的数据是否为单通道图像，如果是单通道图像则增加一个维度。

2.1 代码

import numpy as np
import torch

def _totensor(imgs, bgr2rgb=True, float32=True):               # 辅助函数，用于将单个图像或图像列表从numpy数组转换为PyTorch张量。
    def _convert(img):
        if img.ndim == 2:                                      # 如果图像是二维的（即灰度图像，只有高度和宽度两个维度），那么它会在第三个维度（通道维度）上添加一个额外的维度。
            img = np.expand_dims(img, axis=2)                  # 在指定的轴上为输入数组引入新的维度
        if bgr2rgb and img.shape[2] == 3:                      # 如果图像是彩色的（即第三个维度为3），并且bgr2rgb参数为True，则会将图像从BGR格式转换为RGB格式
            img = img[..., [2, 1, 0]]
        img = torch.from_numpy(np.ascontiguousarray(img))
        if float32:
            img = img.float()                                  # 根据float32参数的值，将图像转换为float或byte类型的张量
        else:
            img = img.byte()
        return img.permute(2, 0, 1).contiguous()               # 将通道维度移到前面

    if isinstance(imgs, list):
        return [_convert(img) for img in imgs]
    else:
        return _convert(imgs)

def img2tensor(imgs, bgr2rgb=True, float32=True):                  # 主函数，用于将单个图像或图像列表从numpy数组转换为PyTorch张量
    if isinstance(imgs, np.ndarray):                               # 如果输入是numpy数组
        if imgs.ndim == 2:                                         # 检查图像是否是灰度图，如果是，则会在第三个维度上添加一个额外的维度
            imgs = np.expand_dims(imgs, axis=2)
        return _totensor(imgs, bgr2rgb, float32)                   # 调用_totensor函数进行实际的转换
    elif isinstance(imgs, list):                                   # 输入是图像列表，它会遍历列表中的每个图像，并对每个图像执行相同的操作
        for i in range(len(imgs)):
            if imgs[i].ndim == 2:
                imgs[i] = np.expand_dims(imgs[i], axis=2)
        return _totensor(imgs, bgr2rgb, float32)
    else:
        raise TypeError("Input should be a numpy array or list of numpy arrays")

# 示例调用
img_gt = np.random.rand(256, 256)  # 灰度图像
print(f"Shape of img_gt: {img_gt.shape}")
img_gt_tensor = img2tensor(img_gt)
print(f"Shape of img_gt_tensor: {img_gt_tensor.shape}")             # 由于原始图像是灰度图，所以转换后的张量的形状应为(1, 256, 256)
print("img_gt_tensor:",img_gt_tensor)