AIMAX集群配置sdfstudio容器记录
- 一、登录
- 二、测试
- 三、通过Filezilla传输数据
- 四、通过第三方私有镜像直接创建环境
- 方式1 从dockerhub中下载
- 方式2 上传github中的dockerfile
- 方式3 上传dockerhub中的第三方镜像
- 1. 在ubuntu在安装docker
- 2. 下载第三方镜像
- 3. 修改hosts
- 4. 下载证书
- 5. 修改镜像标签
- 6. 用户登录
- 7. 上传镜像
- 8. 创建交互式开发环境(Terminal)
- 9. 远程连接
- 五、在AIMa公有镜像上进逐步完善环境
- 1. 在公共镜像基础上创建交互式开发环境(Terminal)
- 2. 创建虚拟环境
- 3. 安装配置oh-my-zsh(可选)
- 4. 将改动后的容器保存为镜像
- 5. 进行任务训练
一、登录
打开主页,输入身份信息。
首页
二、测试
尝试创建一个交互式开发环境(Desktop)
可用性较差,不采用这种方式进行开发。
三、通过Filezilla传输数据
- 安装Fillzilla
ubuntu下:
sudo apt-get install filezilla
windows下可以去官网下载安装。
-
连接主机
输入相关信息进行连接。
-
右击文件,选择上传。
上传完成后会提示,传输的文件可在私有数据下面找到。
四、通过第三方私有镜像直接创建环境
方式1、2不能上传镜像,方式3上传的镜像创建的环境不能进入。怀疑是镜像的原因。这一节可以直接跳过,直接看下一节,根据其公有镜像进行创建环境。
方式1 从dockerhub中下载
可以在dockerhub里搜索到相关镜像,但很遗憾没在网页端看到下载选项
方式2 上传github中的dockerfile
4k的dockerfile文件一直在上传,莫不是在这个过程中就开始创建镜像了?
最终还是失败了
这个情况在手册里有相关说明:
但重试无果,换方式3。
方式3 上传dockerhub中的第三方镜像
1. 在ubuntu在安装docker
- 安装
- docker-ce
- nvidia-docker2
- 将当前用户加入到root组。
❯ sudo cat /etc/group | grep docker
docker:x:998:
❯ sudo usermod -aG docker wj
❯ sudo cat /etc/group | grep docker
❯ sudo chmod a+rw /var/run/docker.sock
❯ sudo systemctl restart docker
❯ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
fcd0ba6349f8 hello-world "/hello" About an hour ago Exited (0) About an hour ago blissful_leavitt
2. 下载第三方镜像
如果加载本地镜像的话:
docker load < dockerimages.tar
从dockerhub上拉取第三方镜像(dockerhub上会给出命令)。
docker pull dromni/sdfstudio:0.2.1
❯ sudo docker pull dromni/sdfstudio:0.2.1
0.2.1: Pulling from dromni/sdfstudio
677076032cca: Pulling fs layer
bc572704fd22: Pulling fs layer
82ca2dd0fe9d: Pulling fs layer
335006729f70: Pulling fs layer
1b9f8e302abf: Pulling fs layer
120deaf0783e: Pulling fs layer
f7b8d7bf559f: Pull complete
e62d0dcce85d: Pull complete
dd4b12c0cbdb: Pull complete
96670d94e1e8: Pull complete
bb10049f791d: Pull complete
9e965195e9d1: Pull complete
f1484bec286b: Pull complete
f1196e20290a: Pull complete
c541d97ea6d8: Pull complete
7f511c789668: Pull complete
737bd131d2c1: Pull complete
270a40ad75d6: Pull complete
f0c0226e364b: Pull complete
6f9fdc754fdc: Pull complete
4f4fb700ef54: Pull complete
30485e8f47b6: Pull complete
d1cb36d9c606: Pull complete
db7430713eb7: Pull complete
19a01bfd85d1: Pull complete
63a1d18dba4d: Pull complete
132d02095598: Pull complete
9bc9681eb426: Pull complete
94c3a9acdb3e: Pull complete
Digest: sha256:1823de016219880ac14dae0bb2d3ba71636802683c24fc60f94bb08b484423e9
Status: Downloaded newer image for dromni/sdfstudio:0.2.1
docker.io/dromni/sdfstudio:0.2.1
下载成功,可通过docker images
查看。
3. 修改hosts
编辑本地环境中的 /etc/hosts 文件,添加一条记录 registry.cluster.local ,IP 设置为AI Max头节点的IP地址,如:
……
……
192.168.124.95 registry.cluster.local
4. 下载证书
- 创建目录
sudo mkdir -p /etc/docker/certs.d/registry.cluster.local
- 下载证书
sudo wget -O /etc/docker/certs.d/registry.cluster.local/ca.crt http://192.168.124.95:5680/ca.crt
5. 修改镜像标签
sudo docker tag myimage:v1.0 registry.cluster.local/user_username/myimage:v1.0
user_username中仅替换username为AI Max UI平台登录的用户名,user_是前缀,不可删除。
如:
sudo docker tag dromni/sdfstudio:0.2.1 registry.cluster.local/user_xxxx/sdfstudio:1.0
修改后的镜像
6. 用户登录
- 获取用户名和密码
在“私有镜像”界面可以点击下载Docker仓库认证信息文件。
文件内容如下:
- 登录
registry.cluster.local
登录的用户名口令是以上pushImagesDoc.txt
的文件中的用户名和密码。
sudo docker login registry.cluster.local
Username: xxxx
Password: xxxxxxxx
报错如下:
网络是个好东西,在另一篇帖子看到了解决方案:
在/etc/docker/daemon.json
加上"insecure-registries": ["https://registry.cluster.local"]
和"default-runtime": "nvidia"
,最终的daemon.json
文件就变成了:
{
"insecure-registries": ["https://registry.cluster.local"],
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
相应目录下没有这个daemon.json
就自己创建一个,加入以上内容。
然后执行
sudo systemctl daemon-reload
sudo systemctl restart docker
再登录成功。
7. 上传镜像
$ sudo docker push registry.cluster.local/user_username/myimage:v1.0
sudo docker push registry.cluster.local/user_xxxx/sdfstudio:1.0
The push refers to repository [registry.cluster.local/user_xxxx/sdfstudio]
5f70bf18a086: Preparing
75c9930f04e3: Preparing
d607f5331dd0: Preparing
775fd1ca67da: Preparing
5c85fd87e7d2: Preparing
c2cc2815d350: Preparing
c261386e14e8: Waiting
8f5a7461deb4: Waiting
5668a06c4f00: Waiting
6f9a406a17ed: Waiting
32a3407bed0d: Waiting
dc1bbc4db2ec: Waiting
77b9d6e4b433: Waiting
6be54aac0530: Waiting
77f74632d268: Waiting
4d6a42904634: Waiting
d04569d95086: Waiting
c2ecd79d5a18: Waiting
bd889e83e652: Waiting
d2e28f4121e3: Waiting
3a12ac953428: Waiting
11df89f48870: Waiting
2106d7cd1026: Waiting
f403f5c5948a: Waiting
8f6106a133b8: Waiting
af561c199f2f: Waiting
ea83d1f80fca: Waiting
65abf0edb23d: Waiting
c5ff2d88f679: Waiting
denied: requested access to the resource is denied
失败了!
换hello-world 镜像试试。
❯ sudo docker push registry.cluster.local/user_wuji/hello:v1.0
The push refers to repository [registry.cluster.local/user_wuji/hello]
01bb4fce3eb1: Preparing
denied: requested access to the resource is denied
依然失败,排除镜像原因,因为sdfstudio镜像有22G,hello-world镜像只有几K。
琢磨了一下,发现原因在于:登录用户的时候没有使用 sudo 命令,加上 sudo,重新登录。
再次推送,正常传输。
耗时18分钟,完成,可在私有镜像中找到。
8. 创建交互式开发环境(Terminal)
由于镜像较大,这个准备的过程同样耗时较长。
成功,不出意外的话现在可以使用了。
9. 远程连接
下载安装 MobaXterm,Free-Protable(其实网上有破解版的,一则没必要,二则为了安全考虑)。
可通过新开一个Session,选择ssh输入相关信息,或者直接在终端中输入ssh命令
尝试多次,这一步失败了。
尝试一下aimax自带镜像,没有问题,可以正常连接。
不死心,在Dockerhub上重新找一个镜像试了一下,依然不行,大概率是要在他自带镜像的基础上再逐步完善了。
五、在AIMa公有镜像上进逐步完善环境
1. 在公共镜像基础上创建交互式开发环境(Terminal)
进入 /opt/data/private 可以看到自己上传的私有数据。
2. 创建虚拟环境
我这里配置的是sdfstudio环境,直接按官方教程来,也有之前的部署记录。
下面仅针对报错进行记录。
- 在执行
conda activate sdfstudio
时,会有相关报错
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run
$ conda init <SHELL_NAME>
Currently supported shells are:
- bash
- fish
- tcsh
- xonsh
- zsh
- powershell
See 'conda init --help' for more information and options.
IMPORTANT: You may need to close and restart your shell after running 'conda init
执行
conda init bash
再关闭shell,重新连接,就可以正常使用虚拟环境了
- 在执行
pip install -e .
时报错
原因是没有切换到项目目录,切换后可以正常执行。
- 执行
ns-install-cli
报错
(sdfstudio) root@sdfstudio:/opt/data/private/sdfstudio# ns-install-cli
[17:52:23] 🤷 .zshrc not found, skipping. install.py:212
🔍 Found .bashrc! install.py:214
[17:52:24] ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-install-cli. install.py:124
❌ Completion script generation failed: ['ns-render-mesh', '--tyro-print-completion', 'bash'] install.py:109
Traceback (most recent call last): install.py:113
File "/opt/conda/envs/sdfstudio/bin/ns-render-mesh", line 5, in <module>
from scripts.render_mesh import entrypoint
File "/opt/data/private/sdfstudio/scripts/render_mesh.py", line 12, in <module>
import cv2
File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 181, in
<module>
bootstrap()
File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 153, in
bootstrap
native_module = importlib.import_module("cv2")
File "/opt/conda/envs/sdfstudio/lib/python3.8/importlib/__init__.py", line 127, in
import_module
return _bootstrap._gcd_import(name, package, level)
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
❌ Completion script generation failed: ['ns-eval', '--tyro-print-completion', 'bash'] install.py:109
Traceback (most recent call last): install.py:113
File "/opt/conda/envs/sdfstudio/bin/ns-eval", line 5, in <module>
from scripts.eval import entrypoint
File "/opt/data/private/sdfstudio/scripts/eval.py", line 11, in <module>
import cv2
File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 181, in
<module>
bootstrap()
File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 153, in
bootstrap
native_module = importlib.import_module("cv2")
File "/opt/conda/envs/sdfstudio/lib/python3.8/importlib/__init__.py", line 127, in
import_module
return _bootstrap._gcd_import(name, package, level)
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
✔ Updated completion at /opt/data/private/sdfstudio/scripts/completions/bash/_ns-dev-test! install.py:122
[17:52:25] ✔ Updated completion at /opt/data/private/sdfstudio/scripts/completions/bash/_ns-process-data! install.py:122
[17:52:26] ❌ Completion script generation failed: ['ns-train', '--tyro-print-completion', 'bash'] install.py:109
Traceback (most recent call last): install.py:113
File "/opt/conda/envs/sdfstudio/bin/ns-train", line 5, in <module>
from scripts.train import entrypoint
File "/opt/data/private/sdfstudio/scripts/train.py", line 48, in <module>
from nerfstudio.configs import base_config as cfg
File "/opt/data/private/sdfstudio/nerfstudio/configs/base_config.py", line 197, in <module>
from nerfstudio.pipelines.base_pipeline import VanillaPipelineConfig
File "/opt/data/private/sdfstudio/nerfstudio/pipelines/base_pipeline.py", line 41, in
<module>
from nerfstudio.data.datamanagers.base_datamanager import (
File "/opt/data/private/sdfstudio/nerfstudio/data/datamanagers/base_datamanager.py", line
35, in <module>
from nerfstudio.cameras.cameras import CameraType
File "/opt/data/private/sdfstudio/nerfstudio/cameras/cameras.py", line 24, in <module>
import cv2
File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 181, in
<module>
bootstrap()
File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 153, in
bootstrap
native_module = importlib.import_module("cv2")
File "/opt/conda/envs/sdfstudio/lib/python3.8/importlib/__init__.py", line 127, in
import_module
return _bootstrap._gcd_import(name, package, level)
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
❌ Completion script generation failed: ['ns-download-data', '--tyro-print-completion', install.py:109
'bash']
❌ Completion script generation failed: ['ns-extract-mesh', '--tyro-print-completion', 'bash'] install.py:109
Traceback (most recent call last): install.py:113
File "/opt/conda/envs/sdfstudio/bin/ns-download-data", line 5, in <module>
from scripts.downloads.download_data import entrypoint
File "/opt/data/private/sdfstudio/scripts/downloads/download_data.py", line 17, in <module>
from nerfstudio.configs.base_config import PrintableConfig
File "/opt/data/private/sdfstudio/nerfstudio/configs/base_config.py", line 197, in <module>
from nerfstudio.pipelines.base_pipeline import VanillaPipelineConfig
File "/opt/data/private/sdfstudio/nerfstudio/pipelines/base_pipeline.py", line 41, in
<module>
from nerfstudio.data.datamanagers.base_datamanager import (
File "/opt/data/private/sdfstudio/nerfstudio/data/datamanagers/base_datamanager.py", line
35, in <module>
from nerfstudio.cameras.cameras import CameraType
File "/opt/data/private/sdfstudio/nerfstudio/cameras/cameras.py", line 24, in <module>
import cv2
File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 181, in
<module>
bootstrap()
File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 153, in
bootstrap
native_module = importlib.import_module("cv2")
File "/opt/conda/envs/sdfstudio/lib/python3.8/importlib/__init__.py", line 127, in
import_module
return _bootstrap._gcd_import(name, package, level)
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
Traceback (most recent call last): install.py:113
File "/opt/conda/envs/sdfstudio/bin/ns-extract-mesh", line 5, in <module>
from scripts.extract_mesh import entrypoint
File "/opt/data/private/sdfstudio/scripts/extract_mesh.py", line 16, in <module>
from nerfstudio.utils.eval_utils import eval_setup
File "/opt/data/private/sdfstudio/nerfstudio/utils/eval_utils.py", line 30, in <module>
from nerfstudio.configs import base_config as cfg
File "/opt/data/private/sdfstudio/nerfstudio/configs/base_config.py", line 197, in <module>
from nerfstudio.pipelines.base_pipeline import VanillaPipelineConfig
File "/opt/data/private/sdfstudio/nerfstudio/pipelines/base_pipeline.py", line 41, in
<module>
from nerfstudio.data.datamanagers.base_datamanager import (
File "/opt/data/private/sdfstudio/nerfstudio/data/datamanagers/base_datamanager.py", line
35, in <module>
from nerfstudio.cameras.cameras import CameraType
File "/opt/data/private/sdfstudio/nerfstudio/cameras/cameras.py", line 24, in <module>
import cv2
File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 181, in
<module>
bootstrap()
File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 153, in
bootstrap
native_module = importlib.import_module("cv2")
File "/opt/conda/envs/sdfstudio/lib/python3.8/importlib/__init__.py", line 127, in
import_module
return _bootstrap._gcd_import(name, package, level)
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "/opt/conda/envs/sdfstudio/bin/ns-install-cli", line 8, in <module>
sys.exit(entrypoint())
File "/opt/data/private/sdfstudio/scripts/completions/install.py", line 284, in entrypoint
tyro.cli(main, description=__doc__)
File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/tyro/_cli.py", line 177, in cli
output = _cli_impl(
File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/tyro/_cli.py", line 430, in _cli_impl
out, consumed_keywords = _calling.call_from_args(
File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/tyro/_calling.py", line 204, in call_from_args
return unwrapped_f(*positional_args, **kwargs), consumed_keywords # type: ignore
File "/opt/data/private/sdfstudio/scripts/completions/install.py", line 253, in main
completion_paths = list(
File "/opt/conda/envs/sdfstudio/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
yield fs.pop().result()
File "/opt/conda/envs/sdfstudio/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/opt/conda/envs/sdfstudio/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/opt/conda/envs/sdfstudio/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/opt/data/private/sdfstudio/scripts/completions/install.py", line 255, in <lambda>
lambda path_or_entrypoint_and_shell: _generate_completion(
File "/opt/data/private/sdfstudio/scripts/completions/install.py", line 114, in _generate_completion
raise e
File "/opt/data/private/sdfstudio/scripts/completions/install.py", line 101, in _generate_completion
new = subprocess.run(
File "/opt/conda/envs/sdfstudio/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ns-download-data', '--tyro-print-completion', 'bash']' returned non-zero exit status 1.
❌ Completion script generation failed: ['ns-render', '--tyro-print-completion', 'bash'] install.py:109
Traceback (most recent call last): install.py:113
File "/opt/conda/envs/sdfstudio/bin/ns-render", line 5, in <module>
from scripts.render import entrypoint
File "/opt/data/private/sdfstudio/scripts/render.py", line 27, in <module>
from nerfstudio.cameras.camera_paths import get_path_from_json, get_spiral_path
File "/opt/data/private/sdfstudio/nerfstudio/cameras/camera_paths.py", line 27, in <module>
from nerfstudio.cameras.cameras import Cameras
File "/opt/data/private/sdfstudio/nerfstudio/cameras/cameras.py", line 24, in <module>
import cv2
File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 181, in
<module>
bootstrap()
File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 153, in
bootstrap
native_module = importlib.import_module("cv2")
File "/opt/conda/envs/sdfstudio/lib/python3.8/importlib/__init__.py", line 127, in
import_module
return _bootstrap._gcd_import(name, package, level)
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
解决办法:
apt-get update && apt-get install libgl1
- 另外,由于另有改动,根据其报错信息还需安装
icecream
和cryptography
pip install icecream
pip install cryptography
再次执行ns-install-cli
,成功。
(sdfstudio) root@sdfstudio:/opt/data/private/sdfstudio# ns-install-cli
[19:32:24] 🤷 .zshrc not found, skipping. install.py:212
🔍 Found .bashrc! install.py:214
[19:32:25] ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-dev-test. install.py:124
✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-install-cli. install.py:124
✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-process-data. install.py:124
[19:32:36] ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-eval. install.py:124
✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-download-data. install.py:124
✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-extract-mesh. install.py:124
✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-render-mesh. install.py:124
✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-render. install.py:124
[19:32:38] ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-train. install.py:124
🧹 Deleted /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-eval. install.py:270
🧹 Deleted /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-install-cli. install.py:270
🧹 Deleted /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-train. install.py:270
🧹 Deleted /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-extract-mesh. install.py:270
🧹 Deleted /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-process-data. install.py:270
🧹 Deleted /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-render. install.py:270
🧹 Deleted /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-download-data. install.py:270
🧹 Deleted /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-render-mesh. install.py:270
🧹 Deleted /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-dev-test. install.py:270
🙆 Completions installed to /root/.bashrc. Exciting! Open a new shell to try them out. install.py:186
All done!
先小小测试一下。
(sdfstudio) root@sdfstudio:/opt/data/private/sdfstudio# ns-train -h
usage: ns-train [-h]
{testsdf,bakedangelo,neuralangelo,bakedsdf,bakedsdf-mlp,neus-facto-angelo,neus-facto,neus-fac
to-bigmlp,geo-volsdf,monosdf,volsdf,geo-neus,mono-neus,neus,unisurf,mono-unisurf,geo-unisurf,dto,neusW,neus-a
cc,nerfacto,instant-ngp,mipnerf,semantic-nerfw,vanilla-nerf,tensorf,dnerf,phototourism}
Train a radiance field with nerfstudio. For real captures, we recommend using the nerfacto model.
Nerfstudio allows for customizing your training and eval configs from the CLI in a powerful way, but there
are some things to understand.
The most demonstrative and helpful example of the CLI structure is the difference in output between the
following commands:
ns-train -h
ns-train nerfacto -h nerfstudio-data
ns-train nerfacto nerfstudio-data -h
In each of these examples, the -h applies to the previous subcommand (ns-train, nerfacto, and
nerfstudio-data).
In the first example, we get the help menu for the ns-train script. In the second example, we get the help
menu for the nerfacto model. In the third example, we get the help menu for the nerfstudio-data dataparser.
With our scripts, your arguments will apply to the preceding subcommand in your command, and thus where you
put your arguments matters! Any optional arguments you discover from running
ns-train nerfacto -h nerfstudio-data
need to come directly after the nerfacto subcommand, since these optional arguments only belong to the
nerfacto subcommand:
ns-train nerfacto {nerfacto optional args} nerfstudio-data
╭─ arguments ───────────────────────────────────────────────────────────────────────────────────────────────╮
│ -h, --help show this help message and exit │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ subcommands ─────────────────────────────────────────────────────────────────────────────────────────────╮
│ {testsdf,bakedangelo,neuralangelo,bakedsdf,bakedsdf-mlp,neus-facto-angelo,neus-facto,neus-facto-bigmlp,g… │
│ testsdf Implementation of TestSDF │
│ bakedangelo Implementation of Neuralangelo with BakedSDF │
│ neuralangelo Implementation of Neuralangelo │
│ bakedsdf Implementation of BackedSDF with multi-res hash grids │
│ bakedsdf-mlp Implementation of BackedSDF with large MLPs │
│ neus-facto-angelo Implementation of Neuralangelo with neus-facto │
│ neus-facto Implementation of NeuS similar to nerfacto where proposal sampler is used. │
│ neus-facto-bigmlp NeuS-facto with big MLP, it is used in training heritage data with 8 gpus │
│ geo-volsdf Implementation of patch warping from GeoNeuS with VolSDF. │
│ monosdf Implementation of MonoSDF. │
│ volsdf Implementation of VolSDF. │
│ geo-neus Implementation of patch warping from GeoNeuS with NeuS. │
│ mono-neus Implementation of MonoSDF with NeuS rendering formulation. │
│ neus Implementation of NeuS. │
│ unisurf Implementation of UniSurf. │
│ mono-unisurf Implementation of MonoSDF with unisurf rendering formulation. │
│ geo-unisurf Implementation of patch warping from GeoNeuS with UniSurf. │
│ dto Occupancy field with density guided sampling │
│ neusW Implementation of Neural Reconstruction in the wild │
│ neus-acc Implementation of NeuS with empty space skipping. │
│ nerfacto Recommended real-time model tuned for real captures. This model will be continually │
│ updated. │
│ instant-ngp Implementation of Instant-NGP. Recommended real-time model for bounded synthetic │
│ data. │
│ mipnerf High quality model for bounded scenes. (slow) │
│ semantic-nerfw Predicts semantic segmentations and filters out transient objects. │
│ vanilla-nerf Original NeRF model. (slow) │
│ tensorf tensorf │
│ dnerf Dynamic-NeRF model. (slow) │
│ phototourism Uses the Phototourism data. │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────╯
没毛病!
3. 安装配置oh-my-zsh(可选)
emmm,bash 没有耗时记录,考虑了一下,还是装一下zsh吧。
- 安装zsh
apt install zsh
- 安装oh-my-zsh
通过文件传输
mv .oh-my-zsh ~/.oh-my-zsh
cp ~/.oh-my-zsh/templates/zshrc.zsh-template ~/.zshrc
chsh -s /bin/zsh
- 安装powerline10k
通过文件传输(包括字体)
mv powerlevel10k ~/.oh-my-zsh/custom/themes
mkdir ~/.fonts
mv MesloLGS* ~/.fonts/*
打开 ~/.bashrc,查看conda 相关配置
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/opt/conda/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/opt/conda/etc/profile.d/conda.sh" ]; then
. "/opt/conda/etc/profile.d/conda.sh"
else
export PATH="/opt/conda/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
- 配置oh-my-zsh
编辑~/.zshrc
vim~/.zshrc
...
ZSH_THEME="powerlevel10k/powerlevel10k"
..
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/opt/conda/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/opt/conda/etc/profile.d/conda.sh" ]; then
. "/opt/conda/etc/profile.d/conda.sh"
else
export PATH="/opt/conda/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
更新~/.zshrc
source ~/.zshrc
zsh 安装后还需要重新执行一下 ns-install-cli
/opt/da/p/sdfstudio ❯ ns-install-cli 🐍 sdfstudio root@sdfstudio 21:51:55
[21:51:58] 🔍 Found .zshrc! install.py:214
🔍 Found .bashrc! install.py:214
[21:51:59] ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-install-cli. install.py:124
✔ Wrote new completion to /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-install-cli! install.py:119
✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-dev-test. install.py:124
✔ Wrote new completion to /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-dev-test! install.py:119
[21:52:00] ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-process-data. install.py:124
✔ Wrote new completion to install.py:119
/opt/data/private/sdfstudio/scripts/completions/zsh/_ns-process-data!
[21:52:55] ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-download-data. install.py:124
✔ Wrote new completion to install.py:119
/opt/data/private/sdfstudio/scripts/completions/zsh/_ns-download-data!
✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-eval. install.py:124
✔ Wrote new completion to /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-eval! install.py:119
✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-render. install.py:124
✔ Wrote new completion to /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-render! install.py:119
✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-extract-mesh. install.py:124
✔ Wrote new completion to install.py:119
/opt/data/private/sdfstudio/scripts/completions/zsh/_ns-extract-mesh!
✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-render-mesh. install.py:124
✔ Wrote new completion to /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-render-mesh! install.py:119
[21:52:57] ✔ Wrote new completion to /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-train! install.py:119
✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-train. install.py:124
🙆 Completions installed to /root/.zshrc. Exciting! Open a new shell to try them out. install.py:186
🧹 Existing completions uninstalled from /root/.bashrc. install.py:180
🙆 Completions installed to /root/.bashrc. Ok! Open a new shell to try them out. install.py:186
All done!
完成,这下就可以使用集群硬件进行训练了。
4. 将改动后的容器保存为镜像
然后可在私有镜像下看到这个镜像,然后可通过这个镜像创建新的环境。
5. 进行任务训练
数据较大时任务会被killed(这个问题通过提高内存没有得到解决,难道是显存或参数的原因?)。
降低分辨率后可以正常训练。
注意到断开ssh连接会导致任务终止,后续考虑通过任务训练方式进行训练,交互式开发可能是得一直处于交互状态才可以(可以用另一台电脑挂着)。