这里写目录标题
- 安装
- 安装VS(CUDA需要VS)
- 安装CUDA
- 安装CUDNN
- 创建Pytorch GPU虚拟环境
- 测试
- 疑难杂症解决链接
- 搭建VGG分类网络并用CUDA训练
- 使用CUDA加速推理分类网络
- C#使用ONNXruntime-gpu推理
安装
安装VS(CUDA需要VS)
2017,2019,2022都可
安装CUDA
Cuda和cuDNN安装教程(超级详细)
查看安装的CUDA
CUDA版本不同:nvidia-smi和nvcc -V
安装CUDNN
Cuda和cuDNN安装教程(超级详细)
创建Pytorch GPU虚拟环境
1.创建虚拟环境
建议创建虚拟环境,你也可以公用,但是不提倡,因为有些场景用到的库版本不一样,以及全部放在一起的话,环境就会非常大,python本身就是一个体积小巧的脚本语言
名字随便取,我这里叫cls_py38_gpu
conda create -n cls_py38_gpu python=3.8
2.安装Pytorch
注意选择CUDA,然后版本这里选择使用11.8
conda安装
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
注意,如果你没有科学上网方法,那么建议用pip或者pip3安装,这两个exe在你的虚拟环境中
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
无论是conda还是pip下载有可能会失败,重新执行命令即可
测试
疑难杂症解决链接
调用CUDA时报错
Could not locate zlibwapi.dll. Please make sure it is in your library path
解决问题:Could not locate zlibwapi.dll. Please make sure it is in your library path!
导出ONNX报错
Exporting AdaptiveAvgPool2d to ONNX with ATen fallback produces an error #17377
Unsupported: ONNX export of operator adaptive_avg_pool2d
训练loss梯度不下降或下降幅度不明显
SGD & Adam优化器
Why doesn’t the accuracy when training VGG-16 change much?
ONNX-gpu推理
How do you run a ONNX model on a GPU?
搭建VGG分类网络并用CUDA训练
完整代码见文末Github仓库
if __name__ == '__main__':
if torch.cuda.is_available():
device = torch.device("cuda:0")
print("Running on the GPU")
num_gpu=torch.cuda.device_count()
print("there are {} gpu on you computer".format(num_gpu))
else:
device = torch.device("cpu")
print("Running on the CPU")
model = VGG(image_channels,num_classes).to(device)
optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
criterion =CrossEntropyLoss()
test_list, train_list = get_files(dataset_folder, test_data_ratio)
train_loader = DataLoader(MyDataset(train_list, transform=None, test=False), batch_size=batch_size, shuffle=True,
collate_fn=collate_fn)
test_loader = DataLoader(MyDataset(test_list, transform=None, test=True), batch_size=batch_size, shuffle=True,
collate_fn=collate_fn)
print("训练集数量{}", train_list.__len__())
print("测试集数量{}", test_list.__len__())
accuracies = []
test_loss = []
train_loss = []
current_accuracy = 0
model.train()
for epoch in range(epochs):
start_time = datetime.now()
loss_epoch = 0
for index, (input, target) in enumerate(train_loader):
input = (input.to(device))
target = (from_numpy(array(target)).long()).to(device)
output = model(input)
loss = criterion(output, target)
optimizer.zero_grad()
loss.backward()
optimizer.step()
loss_epoch += loss.item()
end_time = datetime.now()
print("epoch:{},耗时: {}秒".format(epoch,end_time - start_time))
if (epoch + 1) % train_step_interval == 0:
print("Epoch: {} \t Loss: {:.6f} ".format(epoch + 1, loss_epoch))
使用CUDA加速推理分类网络
完整代码见文末Github仓库
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
if __name__ == '__main__':
with open('config.json') as f:
param_dict = json.load(f)
class_dict = dict()
for i in range(len(param_dict["class_labels"])):
class_dict[i] = param_dict["class_labels"][i]
print("test class dict{}", class_dict)
num_classes = len(param_dict["class_labels"])
image_channels = param_dict["image_channels"]
model = VGG(image_channels, num_classes)
utils.load_model("checkpoints/mnist.pth", model)
model = model.to(device)
print(model)
test_list = utils.get_allfiles(
r"I:\test_images_full")
test_loader = DataLoader(MyDataset(test_list, transform=None, test=True), batch_size=1, shuffle=True,
collate_fn=utils.collate_fn)
correct_num = 0
step = 0
total_num = len(test_list)
with torch.no_grad():
for item in test_loader:
image, label = item
image = image.to(device)
label = label
output = model(image)
# print(class_dict.__getitem__(numpy.argmax(output.numpy())))
# label是list类型,需要转成tensor,output输出n分类的得分,需要求最大下标
res = torch.eq(torch.from_numpy(numpy.array(label)).long().to(device), torch.argmax(output))
step = step + 1
if (res):
correct_num = correct_num + 1
if (step % 100 == 0):
print("{}/{},current accuracy{:.4f}".format(step, total_num, correct_num / step))
print("[{}/{}],correct rate:{}".format(correct_num, len(test_list), correct_num / len(test_list)))
C#使用ONNXruntime-gpu推理
var useCuda = true;
if (useCuda)
{
SessionOptions opts = SessionOptions.MakeSessionOptionWithCudaProvider();
var session = new InferenceSession(modelPath, opts);
return session;
}
else
{
SessionOptions opts = new();
var session = new InferenceSession(modelPath, opts);
return session;
}