Pytorch GPU环境搭建-博客导航

news2025/3/10 14:54:02

这里写目录标题

安装
- 安装VS(CUDA需要VS)
- 安装CUDA
- 安装CUDNN
- 创建Pytorch GPU虚拟环境
测试
- 疑难杂症解决链接
- 搭建VGG分类网络并用CUDA训练
- 使用CUDA加速推理分类网络
- C#使用ONNXruntime-gpu推理

安装

安装VS(CUDA需要VS)

2017，2019，2022都可

安装CUDA

Cuda和cuDNN安装教程(超级详细)
查看安装的CUDA
CUDA版本不同：nvidia-smi和nvcc -V

安装CUDNN

Cuda和cuDNN安装教程(超级详细)

创建Pytorch GPU虚拟环境

1.创建虚拟环境
建议创建虚拟环境，你也可以公用，但是不提倡，因为有些场景用到的库版本不一样，以及全部放在一起的话，环境就会非常大，python本身就是一个体积小巧的脚本语言

名字随便取，我这里叫cls_py38_gpu

conda create -n cls_py38_gpu python=3.8

2.安装Pytorch
注意选择CUDA，然后版本这里选择使用11.8
在这里插入图片描述
conda安装

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

注意，如果你没有科学上网方法，那么建议用pip或者pip3安装，这两个exe在你的虚拟环境中 在这里插入图片描述

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

无论是conda还是pip下载有可能会失败，重新执行命令即可

测试

疑难杂症解决链接

调用CUDA时报错
Could not locate zlibwapi.dll. Please make sure it is in your library path
解决问题：Could not locate zlibwapi.dll. Please make sure it is in your library path!
导出ONNX报错
Exporting AdaptiveAvgPool2d to ONNX with ATen fallback produces an error #17377
Unsupported: ONNX export of operator adaptive_avg_pool2d
训练loss梯度不下降或下降幅度不明显
SGD & Adam优化器
Why doesn’t the accuracy when training VGG-16 change much?
ONNX-gpu推理
How do you run a ONNX model on a GPU?

搭建VGG分类网络并用CUDA训练

完整代码见文末Github仓库


if __name__ == '__main__':
    if torch.cuda.is_available():
        device = torch.device("cuda:0")
        print("Running on the GPU")
        num_gpu=torch.cuda.device_count()
        print("there are {} gpu on you computer".format(num_gpu))
    else:
        device = torch.device("cpu")
        print("Running on the CPU")

    model = VGG(image_channels,num_classes).to(device)
    optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)

    criterion =CrossEntropyLoss()

    test_list, train_list = get_files(dataset_folder, test_data_ratio)

    train_loader = DataLoader(MyDataset(train_list, transform=None, test=False), batch_size=batch_size, shuffle=True,
                              collate_fn=collate_fn)
    test_loader = DataLoader(MyDataset(test_list, transform=None, test=True), batch_size=batch_size, shuffle=True,
                             collate_fn=collate_fn)
    print("训练集数量{}", train_list.__len__())
    print("测试集数量{}", test_list.__len__())
    accuracies = []
    test_loss = []
    train_loss = []
    current_accuracy = 0
    model.train()
    for epoch in range(epochs):
        start_time = datetime.now()
        loss_epoch = 0
        for index, (input, target) in enumerate(train_loader):

            input = (input.to(device))
            target = (from_numpy(array(target)).long()).to(device)
            output = model(input)
            loss = criterion(output, target)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            loss_epoch += loss.item()

        end_time = datetime.now()
        print("epoch:{},耗时: {}秒".format(epoch,end_time - start_time))
        if (epoch + 1) % train_step_interval == 0:
            print("Epoch: {} \t Loss: {:.6f} ".format(epoch + 1, loss_epoch))

使用CUDA加速推理分类网络

完整代码见文末Github仓库

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
if __name__ == '__main__':
    with open('config.json') as f:
        param_dict = json.load(f)
    class_dict = dict()
    for i in range(len(param_dict["class_labels"])):
        class_dict[i] = param_dict["class_labels"][i]

    print("test class dict{}", class_dict)
    num_classes = len(param_dict["class_labels"])
    image_channels = param_dict["image_channels"]
    model = VGG(image_channels, num_classes)
    utils.load_model("checkpoints/mnist.pth", model)
    model = model.to(device)
    print(model)

    test_list = utils.get_allfiles(
        r"I:\test_images_full")

    test_loader = DataLoader(MyDataset(test_list, transform=None, test=True), batch_size=1, shuffle=True,
                             collate_fn=utils.collate_fn)
    correct_num = 0
    step = 0
    total_num = len(test_list)
    with torch.no_grad():
        for item in test_loader:
            image, label = item
            image = image.to(device)
            label = label
            output = model(image)
            # print(class_dict.__getitem__(numpy.argmax(output.numpy())))
            # label是list类型，需要转成tensor，output输出n分类的得分，需要求最大下标
            res = torch.eq(torch.from_numpy(numpy.array(label)).long().to(device), torch.argmax(output))
            step = step + 1
            if (res):
                correct_num = correct_num + 1
            if (step % 100 == 0):
                print("{}/{},current accuracy{:.4f}".format(step, total_num, correct_num / step))
    print("[{}/{}],correct rate:{}".format(correct_num, len(test_list), correct_num / len(test_list)))

C#使用ONNXruntime-gpu推理

var useCuda = true;
if (useCuda)
{
    SessionOptions opts = SessionOptions.MakeSessionOptionWithCudaProvider();
    var session = new InferenceSession(modelPath, opts);
    return session;
}

else
{
    SessionOptions opts = new();
    var session = new InferenceSession(modelPath, opts);
    return session;
}