Server - 高性能的 PyTorch 训练环境配置 (PyTorch3D 和 FairScale)

欢迎关注我的CSDN：https://spike.blog.csdn.net/
本文地址：https://blog.csdn.net/caroline_wendy/article/details/130863537

FairScale

PyTorch3D 是基于 PyTorch 的 3D 数据深度学习库，提供了高效、模块化和可微分的组件，以简化 3D 深度学习的难度。PyTorch3D 包含了常用的 3D 算子和损失函数，以及一个灵活的渲染 API，可以在 PyTorch、C++ 和 CUDA 中并行实现。PyTorch3D 还支持不同大小的 3D 输入的异构批处理，例如网格、点云和体素。PyTorch3D 可以用于多种 3D 深度学习任务，如 3D 形状重建、姿态估计、场景理解和图像合成等。

FairScale 是一个 PyTorch 扩展库，用于在一台或多台机器/节点上进行高性能和大规模训练。这个库扩展了 PyTorch 的基本功能，同时添加了一些新的实验性功能。

FairScale 主要提供了以下几种并行训练算法：

ZeRO：一种减少模型状态（优化器状态、梯度、参数）的冗余的算法，可以在数据并行和模型并行之间实现平衡。
Optimizer State Sharding (OSS)：一种将优化器状态切分并分配给不同的 GPU 的算法，可以大大减少每个 GPU 的内存占用。
Sharded Data Parallel (SDP)：一种在 OSS 的基础上增加了梯度切分和参数广播的算法，可以进一步提高内存效率。
Fully Sharded Data Parallel (FSDP)：一种在 SDP 的基础上增加了模型参数切分的算法，可以支持超大规模的模型训练。

FairScale 的使用方法很简单，只需要用 FairScale 提供的类包装 PyTorch 的模型或优化器即可。

1. 环境准备

构建 Docker 的运行环境：

nvidia-docker run -it --name cryoem-[your name] --shm-size 32G -v [nfs]:[nfs] [base image]:[version]

注意：添加 shm-size 参数，即 shared memory，否则 PyTorch 中 num_workers 参数无法使用。

安装 conda 环境：

bash Miniconda3-py38_23.3.1-0-Linux-x86_64.sh

配置 pip 环境源，注意 pip 环境包括多个位置，如下：

# This file has been autogenerated or modified by NVIDIA PyIndex.
# In case you need to modify your PIP configuration, please be aware that
# some configuration files may have a priority order. Here are the following 
# files that may exists in your machine by order of priority:
#
# [Priority 1] Site level configuration files
#       1. `/opt/conda/pip.conf`
#
# [Priority 2] User level configuration files
#       1. `/root/.config/pip/pip.conf`
#       2. `/root/.pip/pip.conf`
#
# [Priority 3] Global level configuration files
#       1. `/etc/pip.conf`
#       2. `/etc/xdg/pip/pip.conf`

优先级最高的是 /opt/conda/pip.conf 环境，配置命令如下：

vim ~/.pip/pip.conf

[global]
no-cache-dir = true
index-url = https://pypi.tuna.tsinghua.edu.cn/simple/
extra-index-url = https://pypi.ngc.nvidia.com
trusted-host = pypi.ngc.nvidia.com, pypi.tuna.tsinghua.edu.cn

配置 conda 环境源：

channels:
  - defaults
show_channel_urls: true
default_channels:
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
custom_channels:
  conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  msys2: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  bioconda: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  menpo: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  simpleitk: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
channel_priority: disabled
allow_conda_downgrades: true

复制 conda 配置：

cp .condarc ~/.conda/.

2. 配置 Conda 环境

配置环境：

conda create -n cryoem python=3.8

安装 pytorch 命令如下：

# 最新版本的 pytorch 环境 2.0.1
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

检查 PyTorch 是否可用：

python

import torch
print(torch.__version__)
print(torch.cuda.is_available())

安装 pip 包的命令如下：

pip install pytorch3d==0.3.0 mrcfile==1.4.3 pyfftw==0.13.1 fairscale==0.4.13 numba==0.57.0 pandas==2.0.1 siren-pytorch==0.1.6 scipy==1.10.1

注意：FairScale 库不要使用 conda 安装，否则影响 PyTorch 的使用，导致 GPU 失效，建议使用 pip 安装。

核心库如下：

# conda 安装
pytorch=2.0.1

# pip 安装
pytorch3d==0.3.0
mrcfile==1.4.3
pyfftw==0.13.1
fairscale==0.4.13
numba==0.57.0
pandas==2.0.1
siren-pytorch==0.1.6
scipy==1.10.1

3. 运行环境

保存 Docker 环境：

# 保存环境
docker ps -a | grep [tag]
docker commit [container-id] cryoem:v1.0
docker save cryoem:v1.0 | gzip > cryoem_v1_0.tar.gz

# 加载环境
docker image load -i cryoem_v1_0.tar.gz
nvidia-docker run -it --name cryoem-[your name] --shm-size 32G -v [...]:[...] cryoem:v1.0

运行时，使用不同的GPU，bash命令之前，增加配置CUDA_VISIBLE_DEVICES=0,1,2,3,...，即可。

同时，支持使用 yaml 文件，创建 conda 环境。

conda env update -n cryoem --file cryoem_env.yaml

在 yaml 文件中，具体配置命令如下：

name: cryoem
channels:
  - pytorch
  - nvidia
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - blas=1.0=mkl
  - brotlipy=0.7.0=py38h27cfd23_1003
  - bzip2=1.0.8=h7b6447c_0
  - ca-certificates=2023.01.10=h06a4308_0
  - certifi=2022.12.7=py38h06a4308_0
  - cffi=1.15.0=py38h7f8727e_0
  - charset-normalizer=2.0.4=pyhd3eb1b0_0
  - cryptography=39.0.1=py38h9ce1e76_0
  - cuda-cudart=11.8.89=0
  - cuda-cupti=11.8.87=0
  - cuda-libraries=11.8.0=0
  - cuda-nvrtc=11.8.89=0
  - cuda-nvtx=11.8.86=0
  - cuda-runtime=11.8.0=0
  - ffmpeg=4.3=hf484d3e_0
  - filelock=3.9.0=py38h06a4308_0
  - freetype=2.12.1=h4a9f257_0
  - giflib=5.2.1=h5eee18b_3
  - gmp=6.2.1=h295c915_3
  - gmpy2=2.1.2=py38heeb90bb_0
  - gnutls=3.6.15=he1e5248_0
  - idna=3.4=py38h06a4308_0
  - intel-openmp=2021.4.0=h06a4308_3561
  - jinja2=3.1.2=py38h06a4308_0
  - jpeg=9e=h5eee18b_1
  - lame=3.100=h7b6447c_0
  - lcms2=2.12=h3be6417_0
  - lerc=3.0=h295c915_0
  - libcublas=11.11.3.6=0
  - libcufft=10.9.0.58=0
  - libcufile=1.6.1.9=0
  - libcurand=10.3.2.106=0
  - libcusolver=11.4.1.48=0
  - libcusparse=11.7.5.86=0
  - libdeflate=1.17=h5eee18b_0
  - libedit=3.1.20221030=h5eee18b_0
  - libffi=3.2.1=hf484d3e_1007
  - libgcc-ng=11.2.0=h1234567_1
  - libgomp=11.2.0=h1234567_1
  - libiconv=1.16=h7f8727e_2
  - libidn2=2.3.2=h7f8727e_0
  - libnpp=11.8.0.86=0
  - libnvjpeg=11.9.0.86=0
  - libpng=1.6.39=h5eee18b_0
  - libstdcxx-ng=11.2.0=h1234567_1
  - libtasn1=4.19.0=h5eee18b_0
  - libtiff=4.5.0=h6a678d5_2
  - libunistring=0.9.10=h27cfd23_0
  - libwebp=1.2.4=h11a3e52_1
  - libwebp-base=1.2.4=h5eee18b_1
  - lz4-c=1.9.4=h6a678d5_0
  - markupsafe=2.1.1=py38h7f8727e_0
  - mkl=2021.4.0=h06a4308_640
  - mkl-service=2.4.0=py38h7f8727e_0
  - mkl_fft=1.3.1=py38hd3c417c_0
  - mkl_random=1.2.2=py38h51133e4_0
  - mpc=1.1.0=h10f8cd9_1
  - mpfr=4.0.2=hb69a4c5_1
  - mpmath=1.2.1=py38h06a4308_0
  - ncurses=6.4=h6a678d5_0
  - nettle=3.7.3=hbbd107a_1
  - networkx=2.8.4=py38h06a4308_1
  - numpy=1.24.3=py38h14f4228_0
  - numpy-base=1.24.3=py38h31eccc5_0
  - openh264=2.1.1=h4ff587b_0
  - openssl=1.1.1t=h7f8727e_0
  - pillow=9.4.0=py38h6a678d5_0
  - pip=23.0.1=py38h06a4308_0
  - pycparser=2.21=pyhd3eb1b0_0
  - pyopenssl=23.0.0=py38h06a4308_0
  - pysocks=1.7.1=py38h06a4308_0
  - python=3.8.0=h0371630_2
  - pytorch=2.0.1=py3.8_cuda11.8_cudnn8.7.0_0
  - pytorch-cuda=11.8=h7e8668a_5
  - pytorch-mutex=1.0=cuda
  - readline=7.0=h7b6447c_5
  - requests=2.29.0=py38h06a4308_0
  - setuptools=66.0.0=py38h06a4308_0
  - six=1.16.0=pyhd3eb1b0_1
  - sqlite=3.33.0=h62c20be_0
  - sympy=1.11.1=py38h06a4308_0
  - tk=8.6.12=h1ccaba5_0
  - torchaudio=2.0.2=py38_cu118
  - torchtriton=2.0.0=py38
  - torchvision=0.15.2=py38_cu118
  - typing_extensions=4.5.0=py38h06a4308_0
  - urllib3=1.26.15=py38h06a4308_0
  - wheel=0.38.4=py38h06a4308_0
  - xz=5.4.2=h5eee18b_0
  - zlib=1.2.13=h5eee18b_0
  - zstd=1.5.5=hc292b87_0
  - pip:
    - einops==0.6.1
    - fairscale==0.4.13
    - fvcore==0.1.5.post20221221
    - importlib-metadata==6.6.0
    - iopath==0.1.10
    - llvmlite==0.40.0
    - mrcfile==1.4.3
    - numba==0.57.0
    - pandas==2.0.1
    - portalocker==2.7.0
    - pyfftw==0.13.1
    - python-dateutil==2.8.2
    - pytorch3d==0.3.0
    - pytz==2023.3
    - pyyaml==6.0
    - scipy==1.10.1
    - siren-pytorch==0.1.6
    - tabulate==0.9.0
    - termcolor==2.3.0
    - tqdm==4.65.0
    - tzdata==2023.3
    - yacs==0.1.8
    - zipp==3.15.0