09-03 周二 ansible部署和节点管理过程

news2025/1/4 18:48:46
09-03 周二 ansible部署和节点管理过程
时间版本修改人描述
2024年9月3日10:08:58V0.1宋全恒新建文档,

简介

 首先要找一个跳板机,来确保所有的机器都可以访问。然后我们围绕ansible来搭建环境,方便一键执行所有的命令,主要的任务是将这10个节点均挂载NAS服务器,添加我们的harbor服务器,

ansible介绍

 ansible/ansible at v2.17.3是一个自动化的管理工具,可以管理多个节点,实现诸如命令执行,自动挂载,文件拷贝等命令。非常的方便管理集群的场景。

 常用的模块如下所示:

image-20240903153745533

10GPU信息

image-20240903143640134

批量执行

for ip in $(seq 64 73); do ssh root@10.107.204.$ip "systemctl restart docker"; done

结果

 经过设置,在42服务器上使用yuzailiang用户创建了conda虚拟环境,ansible,激活该环境,可实现对于GPU节点的批量操作

部署步骤

创建conda环境,安装ansible

(ansible) yuzailiang@ubuntu:~$ cat update_harbor.yml 
---
- name: Update Docker daemon configuration and ensure valid JSON
  hosts: gpus
  become: yes
  tasks:
    - name: Install Python if not installed
      ansible.builtin.package:
        name: python3
        state: present

    - name: Ensure /etc/docker/daemon.json exists
      ansible.builtin.file:
        path: /etc/docker/daemon.json
        state: touch

    - name: Read existing daemon.json
      ansible.builtin.slurp:
        path: /etc/docker/daemon.json
      register: daemon_json_content

    - name: Decode JSON
      ansible.builtin.set_fact:
        daemon_json: "{{ daemon_json_content['content'] | b64decode | from_json }}"

    - name: Ensure insecure-registries contains the new registry
      ansible.builtin.set_fact:
        updated_daemon_json: >-
          {{
            daemon_json | combine({
              'insecure-registries': (daemon_json['insecure-registries'] | default([])) + ['10.200.88.53']
            })
          }}

    - name: Write updated daemon.json
      ansible.builtin.copy:
        dest: /etc/docker/daemon.json
        content: "{{ updated_daemon_json | to_nice_json }}"
        backup: yes
        mode: '0644'

    - name: Validate JSON syntax
      ansible.builtin.command:
        cmd: 'python3 -m json.tool /etc/docker/daemon.json'
      register: validation_result
      failed_when: validation_result.rc != 0
      ignore_errors: yes

    - name: Print validation result
      ansible.builtin.debug:
        msg: "JSON validation result: {{ validation_result.stdout }}"

    - name: Restart Docker service
      ansible.builtin.service:
        name: docker
        state: restarted

    - name: Log in to Docker registry
      ansible.builtin.command:
        cmd: docker login 10.200.88.53 --username dros_admin --password 'Dros@zjgxn&07101604'
      ignore_errors: yes

配置ansible

新建inventory节点清单

[operator]
10.107.204.64

[framework]
10.107.204.65

[model]
10.107.204.66
10.107.204.67
10.107.204.68
10.107.204.69

[compile]
10.107.204.70

[abstract]
10.107.204.71

[communication]
10.107.204.72
10.107.204.73

# New group that includes all the groups
[gpus:children]
operator
framework
model
compile
abstract
communication

 我们可以进一步的为这些IP起别名,方便我们操作

(ansible) yuzailiang@ubuntu:~$ sudo vim /etc/ansible/hosts 

10.107.204.65
  
[model]
10.107.204.66
10.107.204.67
10.107.204.68
10.107.204.69

[compile]
10.107.204.70

[hardware]
10.107.204.71

[communication]
10.107.204.72
10.107.204.73

# New group that includes all the groups
[gpus:children]
operator
framework
model
compile
hardware
communication

# Aliases for all nodes
[gpus]
gpu1 ansible_host=10.107.204.64
gpu2 ansible_host=10.107.204.65
gpu3 ansible_host=10.107.204.66
gpu4 ansible_host=10.107.204.67
gpu5 ansible_host=10.107.204.68
gpu6 ansible_host=10.107.204.69
gpu7 ansible_host=10.107.204.70
gpu8 ansible_host=10.107.204.71
gpu9 ansible_host=10.107.204.72
gpu10 ansible_host=10.107.204.73

拷贝公钥,免密配置

(ansible) yuzailiang@ubuntu:~/Shell$ bash copy_pub.sh 
正在将公钥复制到 root@10.107.204.64...
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/yuzailiang/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh -o 'StrictHostKeyChecking=no' 'root@10.107.204.64'"
and check to make sure that only the key(s) you wanted were added.

成功将公钥复制到 10.107.204.64
正在将公钥复制到 root@10.107.204.65...
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/yuzailiang/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh -o 'StrictHostKeyChecking=no' 'root@10.107.204.65'"
and check to make sure that only the key(s) you wanted were added.

成功将公钥复制到 10.107.204.65
正在将公钥复制到 root@10.107.204.66...
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/yuzailiang/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
		(if you think this is a mistake, you may want to use -f option)

成功将公钥复制到 10.107.204.66
正在将公钥复制到 root@10.107.204.67...
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/yuzailiang/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh -o 'StrictHostKeyChecking=no' 'root@10.107.204.67'"
and check to make sure that only the key(s) you wanted were added.

成功将公钥复制到 10.107.204.67
正在将公钥复制到 root@10.107.204.68...
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/yuzailiang/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh -o 'StrictHostKeyChecking=no' 'root@10.107.204.68'"
and check to make sure that only the key(s) you wanted were added.

成功将公钥复制到 10.107.204.68
正在将公钥复制到 root@10.107.204.69...
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/yuzailiang/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh -o 'StrictHostKeyChecking=no' 'root@10.107.204.69'"
and check to make sure that only the key(s) you wanted were added.

成功将公钥复制到 10.107.204.69
正在将公钥复制到 root@10.107.204.70...
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/yuzailiang/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh -o 'StrictHostKeyChecking=no' 'root@10.107.204.70'"
and check to make sure that only the key(s) you wanted were added.

成功将公钥复制到 10.107.204.70
正在将公钥复制到 root@10.107.204.71...
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/yuzailiang/.ssh/id_rsa.pub"

 进一步的,可以优化这个脚本,方便复用

(ansible) yuzailiang@ubuntu:~/Shell$ cat copy_pub.sh 
#!/bin/bash

# 参数检查
if [ $# -ne 3 ]; then
  echo "使用方法: $0 <基础IP> <起始IP> <终止IP>"
  echo "示例: $0 10.107.204 72 73"
  exit 1
fi

# 获取参数
BASE_IP="$1."
START_IP=$2
END_IP=$3

# SSH用户
USER="root"

# SSH密码
PASSWORD="qsgctys@05980"

# 公钥路径
PUB_KEY_PATH="$HOME/.ssh/id_rsa.pub"

# 检查sshpass是否安装
if ! command -v sshpass &> /dev/null; then
  echo "sshpass未安装,请先安装它。"
  exit 1
fi

# 检查公钥是否存在
if [ ! -f "$PUB_KEY_PATH" ]; then
  echo "SSH公钥未找到,请生成公钥或指定正确的路径。"
  exit 1
fi

# 循环遍历IP范围并复制公钥
for i in $(seq $START_IP $END_IP); do
  FULL_IP="$BASE_IP$i"
  echo "正在将公钥复制到 $USER@$FULL_IP..."
  
  # 使用sshpass传递密码并复制公钥
  sshpass -p "$PASSWORD" ssh-copy-id -i "$PUB_KEY_PATH" -o StrictHostKeyChecking=no "$USER@$FULL_IP"
  
  if [ $? -eq 0 ]; then
    echo "成功将公钥复制到 $FULL_IP"
  else
    echo "无法连接到 $FULL_IP,跳过..."
  fi
done

echo "所有操作完成。"

配置远端用户/etc/ansible/ansible.cfg

 由于在本机的用户为yuzailiang,而远端操作机器的用户为root,因此我们需要关联私钥和用户。配置

(ansible) yuzailiang@ubuntu:~/Shell$ sudo cat /etc/ansible/ansible.cfg 
[defaults]
remote_user = root
private_key_file = ~/.ssh/id_rsa
interpreter_python = auto

 最后interpreter_python = auto是为了抑制警告。

因此,在使用ansible环境时,需要使用42服务器,使用yuzailiang用户登录,激活环境ansible,然后就能愉快的操作这些节点组了。

使用playbook编辑hosts

 新建play-bok剧本文件

(ansible) yuzailiang@ubuntu:~$ cat update_hosts.yml 
---
- name: Ensure /etc/hosts contains NAS entry
  hosts: gpus  # 指定目标组名
  become: yes  # 提升权限以编辑 /etc/hosts
  tasks:
    - name: Check if /etc/hosts contains NAS entry
      ansible.builtin.lineinfile:
        path: /etc/hosts
        line: "10.15.35.70 NAS"
        state: present
        backup: yes  # 可选,备份文件
      tags: hosts


(ansible) yuzailiang@ubuntu:~$ ansible-playbook  update_hosts.yml -l model

PLAY [Ensure /etc/hosts contains NAS entry] **********************************************************************************************************************************************************************

TASK [Gathering Facts] *******************************************************************************************************************************************************************************************
[WARNING]: Platform linux on host 10.107.204.67 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.67]
[WARNING]: Platform linux on host 10.107.204.69 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.69]
[WARNING]: Platform linux on host 10.107.204.68 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.68]
[WARNING]: Platform linux on host 10.107.204.66 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.66]

TASK [Check if /etc/hosts contains NAS entry] ********************************************************************************************************************************************************************
changed: [10.107.204.67]
changed: [10.107.204.66]
changed: [10.107.204.68]
changed: [10.107.204.69]

PLAY RECAP *******************************************************************************************************************************************************************************************************
10.107.204.66              : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.67              : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.68              : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.69              : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

挂载NAS

新建剧本ensure_mounts.yml

(ansible) yuzailiang@ubuntu:~$ cat ensure_mounts.yml 
---
- name: Ensure directories and mounts are configured
  hosts: all  # 或者指定特定的组,如 'gpus'
  become: yes  # 提升权限以创建目录、编辑 /etc/fstab 和执行挂载操作
  tasks:
    - name: Ensure directories exist
      ansible.builtin.file:
        path: "{{ item }}"
        state: directory
        mode: '0755'
      loop:
        - /mnt/nas_v1
        - /mnt/nas_v2
        - /mnt/self-define

    - name: Ensure fstab contains necessary entries
      ansible.builtin.lineinfile:
        path: /etc/fstab
        line: "{{ item }}"
        state: present
        backup: yes  # 可选,备份文件
      loop:
        - "nas:/volume1/1 /mnt/nas_v1 nfs defaults 0 0"
        - "nas:/volume1/1/self-define /mnt/self-define nfs defaults 0 0"
        - "nas:/volume2/2 /mnt/nas_v2 nfs defaults 0 0"

    - name: Ensure all filesystems are mounted
      ansible.builtin.mount:
        path: "{{ item.path }}"
        src: "{{ item.src }}"
        fstype: "{{ item.fstype }}"
        opts: "{{ item.opts }}"
        state: mounted
      loop:
        - { path: "/mnt/nas_v1", src: "nas:/volume1/1", fstype: "nfs", opts: "defaults" }
        - { path: "/mnt/self-define", src: "nas:/volume1/1/self-define", fstype: "nfs", opts: "defaults" }
        - { path: "/mnt/nas_v2", src: "nas:/volume2/2", fstype: "nfs", opts: "defaults" }

执行命令

 执行上述剧本,创建目录,更新/etc/fstab 并且执行挂载

(ansible) yuzailiang@ubuntu:~$ ansible-playbook ensure_mounts.yml -l gpus

PLAY [Ensure directories and mounts are configured] **************************************************************************************************************************************************************

TASK [Gathering Facts] *******************************************************************************************************************************************************************************************
[WARNING]: Platform linux on host 10.107.204.67 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.67]
[WARNING]: Platform linux on host 10.107.204.68 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.68]
[WARNING]: Platform linux on host 10.107.204.64 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.64]
[WARNING]: Platform linux on host 10.107.204.65 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.65]
[WARNING]: Platform linux on host 10.107.204.66 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.66]
[WARNING]: Platform linux on host 10.107.204.69 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.69]
[WARNING]: Platform linux on host 10.107.204.72 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.72]
[WARNING]: Platform linux on host 10.107.204.70 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.70]
[WARNING]: Platform linux on host 10.107.204.73 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.73]
fatal: [10.107.204.71]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 10.107.204.71 port 22: Connection timed out", "unreachable": true}

TASK [Ensure directories exist] **********************************************************************************************************************************************************************************
ok: [10.107.204.68] => (item=/mnt/nas_v1)
ok: [10.107.204.65] => (item=/mnt/nas_v1)
changed: [10.107.204.66] => (item=/mnt/nas_v1)
ok: [10.107.204.64] => (item=/mnt/nas_v1)
ok: [10.107.204.67] => (item=/mnt/nas_v1)
ok: [10.107.204.68] => (item=/mnt/nas_v2)
ok: [10.107.204.65] => (item=/mnt/nas_v2)
changed: [10.107.204.66] => (item=/mnt/nas_v2)
ok: [10.107.204.67] => (item=/mnt/nas_v2)
ok: [10.107.204.64] => (item=/mnt/nas_v2)
ok: [10.107.204.68] => (item=/mnt/self-define)
ok: [10.107.204.65] => (item=/mnt/self-define)
ok: [10.107.204.64] => (item=/mnt/self-define)
ok: [10.107.204.67] => (item=/mnt/self-define)
ok: [10.107.204.66] => (item=/mnt/self-define)
ok: [10.107.204.69] => (item=/mnt/nas_v1)
ok: [10.107.204.70] => (item=/mnt/nas_v1)
ok: [10.107.204.73] => (item=/mnt/nas_v1)
ok: [10.107.204.72] => (item=/mnt/nas_v1)
ok: [10.107.204.69] => (item=/mnt/nas_v2)
ok: [10.107.204.70] => (item=/mnt/nas_v2)
ok: [10.107.204.72] => (item=/mnt/nas_v2)
ok: [10.107.204.73] => (item=/mnt/nas_v2)
ok: [10.107.204.69] => (item=/mnt/self-define)
ok: [10.107.204.70] => (item=/mnt/self-define)
ok: [10.107.204.72] => (item=/mnt/self-define)
ok: [10.107.204.73] => (item=/mnt/self-define)

TASK [Ensure fstab contains necessary entries] *******************************************************************************************************************************************************************
ok: [10.107.204.64] => (item=nas:/volume1/1 /mnt/nas_v1 nfs defaults 0 0)
ok: [10.107.204.67] => (item=nas:/volume1/1 /mnt/nas_v1 nfs defaults 0 0)
ok: [10.107.204.68] => (item=nas:/volume1/1 /mnt/nas_v1 nfs defaults 0 0)
ok: [10.107.204.66] => (item=nas:/volume1/1 /mnt/nas_v1 nfs defaults 0 0)
ok: [10.107.204.65] => (item=nas:/volume1/1 /mnt/nas_v1 nfs defaults 0 0)
ok: [10.107.204.64] => (item=nas:/volume1/1/self-define /mnt/self-define nfs defaults 0 0)
ok: [10.107.204.67] => (item=nas:/volume1/1/self-define /mnt/self-define nfs defaults 0 0)
ok: [10.107.204.66] => (item=nas:/volume1/1/self-define /mnt/self-define nfs defaults 0 0)
ok: [10.107.204.65] => (item=nas:/volume1/1/self-define /mnt/self-define nfs defaults 0 0)
ok: [10.107.204.68] => (item=nas:/volume1/1/self-define /mnt/self-define nfs defaults 0 0)
ok: [10.107.204.64] => (item=nas:/volume2/2 /mnt/nas_v2 nfs defaults 0 0)
ok: [10.107.204.67] => (item=nas:/volume2/2 /mnt/nas_v2 nfs defaults 0 0)
ok: [10.107.204.65] => (item=nas:/volume2/2 /mnt/nas_v2 nfs defaults 0 0)
ok: [10.107.204.66] => (item=nas:/volume2/2 /mnt/nas_v2 nfs defaults 0 0)
ok: [10.107.204.68] => (item=nas:/volume2/2 /mnt/nas_v2 nfs defaults 0 0)
ok: [10.107.204.69] => (item=nas:/volume1/1 /mnt/nas_v1 nfs defaults 0 0)
ok: [10.107.204.70] => (item=nas:/volume1/1 /mnt/nas_v1 nfs defaults 0 0)
ok: [10.107.204.73] => (item=nas:/volume1/1 /mnt/nas_v1 nfs defaults 0 0)
ok: [10.107.204.72] => (item=nas:/volume1/1 /mnt/nas_v1 nfs defaults 0 0)
ok: [10.107.204.69] => (item=nas:/volume1/1/self-define /mnt/self-define nfs defaults 0 0)
ok: [10.107.204.70] => (item=nas:/volume1/1/self-define /mnt/self-define nfs defaults 0 0)
ok: [10.107.204.72] => (item=nas:/volume1/1/self-define /mnt/self-define nfs defaults 0 0)
ok: [10.107.204.73] => (item=nas:/volume1/1/self-define /mnt/self-define nfs defaults 0 0)
ok: [10.107.204.69] => (item=nas:/volume2/2 /mnt/nas_v2 nfs defaults 0 0)
ok: [10.107.204.70] => (item=nas:/volume2/2 /mnt/nas_v2 nfs defaults 0 0)
ok: [10.107.204.72] => (item=nas:/volume2/2 /mnt/nas_v2 nfs defaults 0 0)
ok: [10.107.204.73] => (item=nas:/volume2/2 /mnt/nas_v2 nfs defaults 0 0)

TASK [Ensure all filesystems are mounted] ************************************************************************************************************************************************************************
ok: [10.107.204.66] => (item={'path': '/mnt/nas_v1', 'src': 'nas:/volume1/1', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.65] => (item={'path': '/mnt/nas_v1', 'src': 'nas:/volume1/1', 'fstype': 'nfs', 'opts': 'defaults'})
ok: [10.107.204.66] => (item={'path': '/mnt/self-define', 'src': 'nas:/volume1/1/self-define', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.67] => (item={'path': '/mnt/nas_v1', 'src': 'nas:/volume1/1', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.64] => (item={'path': '/mnt/nas_v1', 'src': 'nas:/volume1/1', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.68] => (item={'path': '/mnt/nas_v1', 'src': 'nas:/volume1/1', 'fstype': 'nfs', 'opts': 'defaults'})
ok: [10.107.204.66] => (item={'path': '/mnt/nas_v2', 'src': 'nas:/volume2/2', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.65] => (item={'path': '/mnt/self-define', 'src': 'nas:/volume1/1/self-define', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.67] => (item={'path': '/mnt/self-define', 'src': 'nas:/volume1/1/self-define', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.64] => (item={'path': '/mnt/self-define', 'src': 'nas:/volume1/1/self-define', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.68] => (item={'path': '/mnt/self-define', 'src': 'nas:/volume1/1/self-define', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.65] => (item={'path': '/mnt/nas_v2', 'src': 'nas:/volume2/2', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.67] => (item={'path': '/mnt/nas_v2', 'src': 'nas:/volume2/2', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.64] => (item={'path': '/mnt/nas_v2', 'src': 'nas:/volume2/2', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.69] => (item={'path': '/mnt/nas_v1', 'src': 'nas:/volume1/1', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.68] => (item={'path': '/mnt/nas_v2', 'src': 'nas:/volume2/2', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.69] => (item={'path': '/mnt/self-define', 'src': 'nas:/volume1/1/self-define', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.70] => (item={'path': '/mnt/nas_v1', 'src': 'nas:/volume1/1', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.72] => (item={'path': '/mnt/nas_v1', 'src': 'nas:/volume1/1', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.73] => (item={'path': '/mnt/nas_v1', 'src': 'nas:/volume1/1', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.69] => (item={'path': '/mnt/nas_v2', 'src': 'nas:/volume2/2', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.70] => (item={'path': '/mnt/self-define', 'src': 'nas:/volume1/1/self-define', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.72] => (item={'path': '/mnt/self-define', 'src': 'nas:/volume1/1/self-define', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.73] => (item={'path': '/mnt/self-define', 'src': 'nas:/volume1/1/self-define', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.70] => (item={'path': '/mnt/nas_v2', 'src': 'nas:/volume2/2', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.72] => (item={'path': '/mnt/nas_v2', 'src': 'nas:/volume2/2', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.73] => (item={'path': '/mnt/nas_v2', 'src': 'nas:/volume2/2', 'fstype': 'nfs', 'opts': 'defaults'})

PLAY RECAP *******************************************************************************************************************************************************************************************************
10.107.204.64              : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.65              : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.66              : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.67              : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.68              : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.69              : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.70              : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.71              : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.72              : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.73              : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0  

harbor处理

 如果在某些节点上的 /etc/docker/daemon.json 文件中已经包含了 "insecure-registries" 配置项,并且你希望添加新的仓库地址而不删除现有的项,你需要确保更新操作不会覆盖现有的配置。Ansible 的 blockinfile 模块可以帮助你添加新的配置,同时保留文件中已存在的其他内容。

新建playbook

---
- name: Update Docker daemon configuration and login to repository if needed
  hosts: gpus
  become: yes
  tasks:
    - name: Install python3
      ansible.builtin.package:
        name: python3
        state: present

    - name: Ensure /etc/docker/daemon.json exists
      ansible.builtin.file:
        path: /etc/docker/daemon.json
        state: touch

    - name: Add new registry to /etc/docker/daemon.json
      ansible.builtin.blockinfile:
        path: /etc/docker/daemon.json
        block: |
          {
            "insecure-registries": ["10.200.88.53"]
          }
        marker: "# {mark} ANSIBLE MANAGED BLOCK"
        create: yes
        backup: yes
        mode: '0644'
        validate: 'python3 -m json.tool %s > /dev/null'

    - name: Restart Docker service
      ansible.builtin.service:
        name: docker
        state: restarted

    - name: Check if Docker is already logged in
      ansible.builtin.command:
        cmd: docker info | grep "Username:"
      register: docker_login_status
      ignore_errors: yes

    - name: Log in to Docker registry if not already logged in
      ansible.builtin.command:
        cmd: docker login 10.200.88.53 --username dros_admin --password 'Dros@zjgxn&07101604'
      when: docker_login_status.rc != 0
      ignore_errors: yes

playbook解析

 上述命令解析如下

确保 Python 已安装

  • 确保在节点上安装了 Python3,因为 json.tool 需要 Python3 支持。

确保 /etc/docker/daemon.json 存在

  • 确保该文件存在,即使它是空文件。

读取现有的 daemon.json

  • 使用 slurp 模块读取现有的 JSON 文件内容。

解码 JSON

  • 将读取到的 base64 编码的内容解码并转换为 JSON 对象。

确保包含新仓库地址

  • 更新 JSON 对象,确保 insecure-registries 中包含新的仓库地址。

写入更新后的 daemon.json

  • 将更新后的 JSON 写入到 /etc/docker/daemon.json,并进行备份。

验证 JSON 语法

  • 验证 JSON 文件的语法正确性。

重启 Docker 服务

  • 确保 Docker 服务使用新的配置重新启动。

直接登录 Docker 注册表

  • 尝试登录 Docker 注册表,如果登录失败不会中断 Playbook 的执行。

执行命令

(ansible) yuzailiang@ubuntu:~$ ansible-playbook update_harbor.yml

TASK [Restart Docker service] ************************************************************************************************************************************************************************************
changed: [10.107.204.65]
changed: [10.107.204.66]
changed: [10.107.204.64]
changed: [10.107.204.68]
changed: [10.107.204.67]
changed: [10.107.204.72]
changed: [10.107.204.69]
changed: [10.107.204.70]
changed: [10.107.204.73]

TASK [Log in to Docker registry] *********************************************************************************************************************************************************************************
changed: [10.107.204.64]
changed: [10.107.204.67]
changed: [10.107.204.66]
changed: [10.107.204.65]
changed: [10.107.204.68]
changed: [10.107.204.69]
changed: [10.107.204.72]
changed: [10.107.204.70]
changed: [10.107.204.73]

PLAY RECAP *******************************************************************************************************************************************************************************************************
10.107.204.64              : ok=11   changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.65              : ok=11   changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.66              : ok=11   changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.67              : ok=11   changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.68              : ok=11   changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.69              : ok=11   changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.70              : ok=11   changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.71              : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.72              : ok=11   changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.73              : ok=11   changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   


总结

 本文围绕ansible,以及ansible命令和ansible-playbook命令完成了自动化集群管理的环境部署,以及使用,通过自动完成harbor仓库配置,NAS目录挂载,更新hosts,等同类任务方便所有GPU节点的使用。ansible是一个非常良好的自动化管理工具。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2103033.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

通信算法之232: 无线发射功率和信号强度,常用单位dB、dBm、dBi和dBd介绍

[转载] 无线功率和信号强度的基本概念 在无线网络中&#xff0c;使用AP设备和天线来实现有线和无线信号互相转换。如下图所示&#xff1a; 有线网络侧的数据从AP设备的有线接口进入AP后&#xff0c;经AP处理为射频信号&#xff0c;从AP的发送端&#xff08;TX&#xff09;经过…

JAVA-JVM 内存模型类加载器GC算法GC调优

JAVA-JVM 内存模型&类加载器&GC算法&GC调优 什么是JVM JVM 内存模型 JVM的GC算法 JVM类加载器 什么是JVM ? [[jvm]]是Java Virtual Machine&#xff08;Java虚拟机&#xff09;的缩写&#xff0c;JVM是一个虚构出来的计算机&#xff0c;有着自己完善的硬件架构&a…

Qwen-7B-Chat大模型安装训练推理-helloworld

初始大模型之helloworld编写 开发环境&#xff1a;modelscope GPU版本上测试的&#xff0c;GPU免费36小时 ps:可以不用conda直接用环境自带的python环境使用 魔搭社区 安装conda wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh 1.2 bash Mini…

港科夜闻 | 香港科大举办开学嘉年华,叶玉如校长勉励新生发掘潜能传承凡事皆可为精神...

关注并星标 每周阅读港科夜闻 建立新视野 开启新思维 1、香港科大举办开学嘉年华&#xff0c;叶玉如校长勉励新生发掘潜能传承「凡事皆可为」精神。迎接新学年&#xff0c;香港科大于9月2日起举行为期两天的开学嘉年华「Fire Up Your Year」&#xff0c;校长叶玉如教授联同一众…

AI写作保姆级方法论第六节-AI的终极调教心法(问题+解决方案)

效果是什么 大象基于大量的实战经验&#xff0c;总结出了AI prompt调教的终极杀手锏&#xff1a;【终极调教心法&#xff1a;1个原则和3个技巧】 一个原则&#xff0c;是指AI的【角色扮演法】&#xff0c;openai官方基于AI原理给出的让AI听话的技巧。所有AI的使用玩法&#xff…

Leetcode3250. 单调数组对的数目 I

Every day a Leetcode 题目来源&#xff1a;3250. 单调数组对的数目 I 解法1&#xff1a;记忆化搜索 题目输入一个数组nums。 假设有两个数组A和B&#xff0c;A递增&#xff0c;B递减&#xff0c;且 Ai Bi numsi ​ 问有多少对(A,B)数组对。 解法&#xff1a; 代码&…

java基础知识-JVM知识详解

一、JVM内存结构 Java虚拟机(JVM)的内存结构主要分为几个不同的区域,每个区域都有其特定的目的和功能。以下是JVM内存结构的主要组成部分: 先看一下总体的结构图 程序计数器(Program Counter Register) 这是一个较小的内存块,用于存储当前线程所执行的字节码指令的地址…

第T4周:猴痘病识别

本文为&#x1f517;365天深度学习训练营 中的学习记录博客原作者&#xff1a;K同学啊 我的环境&#xff1a; ● 语言环境&#xff1a;Python3.6.5 ● 编译器&#xff1a;jupyter notebook ● 深度学习框架&#xff1a;TensorFlow 2.6.2 ● 数据&#xff1a;猴痘病数据集 一、…

非 congda 环境 ubuntu 22.04 源码编译安装 pytorch 并初步检查可用性

非 congda 环境 编译安装 pytorch 0, 安装 cuda sdk &#xff0c;cudnn 及 nccl 按照官网步骤&#xff0c;blacklist需要特别注意 0.1 cuda sdk 0.2 cudnn 0.3 安装nccl git clone --recursive https://github.com/NVIDIA/nccl.git ls cd nccl/ make -j src.build sudo apt…

使用 docker 部署 kvm 图形化管理工具 WebVirtMgr

文章目录 [toc]前提条件镜像构建启动 webvirtmgr创建其他 superuser配置 nginx 反向代理和域名访问绑定 kvm 宿主机local sockettcp 连接 虚拟机创建创建快照虚拟机克隆删除虚拟机 kvm 官方提供了以下这些图形化管理&#xff0c;license 这块也提示了是商业版&#xff08;Comme…

rometheus Blackbox监控网站

Blackbox Exporter简介 blackbox_exporter 是 Prometheus 拿来对 http/https、tcp、icmp、dns、进行的黑盒监控工具&#xff0c;也就是从服务、主机等外部进行探测&#xff0c;来查看服务、主机等是否可用。 Blackbox Exporter 默认端口是 9115&#xff0c; 安装1 wget htt…

Codeforces Round (Div.3) C.Sort (前缀和的应用)

原题&#xff1a; time limit per test&#xff1a;5 seconds memory limit per test&#xff1a;256 megabytes You are given two strings a and b of length n. Then, you are (forced against your will) to answer q queries. For each query, you are given a range …

Dify 与 FastGPT 流程编排能力对比分析

Dify 与 FastGPT 流程编排能力对比分析 一、引言 在人工智能快速发展的今天&#xff0c;大语言模型&#xff08;LLM&#xff09;应用平台正在重塑各行各业的工作流程。其中&#xff0c;Dify 和 FastGPT 作为两款具有重要影响力的工具&#xff0c;凭借各自独特的流程编排能力&a…

智能化升级:AI在客服知识库中的应用

引言 在数字化时代&#xff0c;客户服务已成为企业竞争的关键一环。随着人工智能&#xff08;AI&#xff09;技术的飞速发展&#xff0c;传统客服模式正经历着前所未有的变革。AI与客服知识库的深度融合&#xff0c;不仅极大地提升了客服处理的效率与准确性&#xff0c;还为用…

unreal engine 5.4.4 runtime 使用PCG

Unreal PCG Runtime runtime环境下控制PCG PCG Graph 这里简单的在landscape上Spawn Static Mesh 和 Spawn Actor GraphSetting 自定义的参数&#xff0c;方便修改 场景 这里新建了一个蓝图Actor PCG_Ctrl, 用来runtime的时候控制PCG生成 Construct 获取场景中的PCGVolum…

Oracle版本简介手册

Oracle版本简介手册 图1—数据库发布路线图表 Oracle数据库的各个版本反映了其技术的发展历程和功能增强&#xff0c;从最早的Oracle 1&#xff08;1979年&#xff09;到最新的版本&#xff0c;每个版本都带来了新的特性和改进&#xff0c;以满足不断变化的企业需求。以下是Or…

【数学建模国赛思路预约】2024全国大学生数学建模竞赛助攻思路、代码、论文

2024年全国大学生数学建模大赛马上就要开始了&#xff0c;大家有没有准备好呢&#xff0c;今年将会和之前一样&#xff0c;将会在比赛赛中时期为大家提供比赛各题的相关解题思路、可运行代码参考以及成品论文。 一、分享计划表如下所示 1、 赛中分享内容包括&#xff08;2023国…

详解React setState调用原理和批量更新的过程

1. React setState 调用的原理 setState目录 1. React setState 调用的原理2. React setState 调用之后发生了什么&#xff1f;是同步还是异步&#xff1f;3. React中的setState批量更新的过程是什么&#xff1f; 具体的执行过程如下&#xff08;源码级解析&#xff09;&#x…

马尔科夫决策过程(MDP):详解与应用

马尔科夫决策过程&#xff08;MDP&#xff09;&#xff1a;详解与应用 引言 在人工智能、机器学习和运筹学等领域&#xff0c;马尔科夫决策过程&#xff08;Markov Decision Process&#xff0c;MDP&#xff09;是一个基础而重要的数学模型。MDP 被广泛应用于优化决策问题&am…

1.1什么是SQL注入

SQL 注入&#xff08;Injection&#xff09; 概述 SQL注入即是指web应用程序对用户输入数据的合法性没有判断或过滤不严&#xff0c;攻击者可以在web应用程序中事先定义好的查询语句的结尾上添加额外的SQL语句&#xff0c;在管理员不知情的情况下实现非法操作&#xff0c;以此来…