DeepSeek 3FS集群化部署临时笔记

news2025/3/18 0:26:30

DeepSeek 3FS集群化部署临时笔记

    • 一、3FS集群化部署
      • 1、环境介绍
      • 2、对应的软件包安装
      • 3、编译
      • 4、部署
        • 4.1 部署monitor_collector_main
        • Step 2: Admin client
        • Step 3: Mgmtd service
        • Step 4: Meta service
        • Step 5: Storage service
        • Step 6: Create admin user, storage targets and chain table
        • Step 7: FUSE client
      • 5、遇到的问题
      • 6、部分命令解释
        • 6.1 第一条命令
        • 参数解析
        • 6.2 第二条命令
          • 参数解析
        • 第一条命令 (`data_placement.py`)
        • 第二条命令 (`gen_chain_table.py`)

一、3FS集群化部署

官方文档和issue:
https://github.com/deepseek-ai/3FS/issues.
https://github.com/deepseek-ai/3FS/blob/main/deploy/README.md

1、环境介绍

节点管理IP25G_IPOS服务说明
3fs-node-meta001172.20.99.9411.12.63.55ubuntu 22.04mgmtd_main.service、meta_main.service、hf3fs_fuse_main.service、foundationdb、clickhouse_server需要配置admin_cli
3fs-node-meta001172.20.99.9611.12.63.54ubuntu 22.04storage_main.service需要配置admin_cli
3fs-node-meta001172.20.99.12111.12.63.57ubuntu 22.04storage_main.service需要配置admin_cli
3fs-node-meta001172.20.99.12211.12.63.58ubuntu 22.04storage_main.service需要配置admin_cli
RDMA Configuration

Assign IP addresses to RDMA NICs. Multiple RDMA NICs (InfiniBand or RoCE) are supported on each node.
Check RDMA connectivity between nodes using ib_write_bw.

部署说明:

  • 端口冲突 : 由于我是mgmtd服务和clickhost_server一起部署,会导致存在9000端口冲突,导致mgmtd无法启动问题
  • 解决方法: 需要把clickhouse_server配置文件中的9000端口调整下,比如我这里调整为6000
root@3fs-node-meta01:~# netstat  -antulp | grep 6000 | head
tcp        0      0 172.20.99.94:50876      172.20.99.94:6000       ESTABLISHED 154445/monitor_coll
tcp        0      0 172.20.99.94:50852      172.20.99.94:6000       ESTABLISHED 154445/monitor_coll
tcp        0      0 172.20.99.94:50904      172.20.99.94:6000       ESTABLISHED 154445/monitor_coll
tcp        0      0 172.20.99.94:50926      172.20.99.94:6000       ESTABLISHED 154445/monitor_coll
tcp        0      0 172.20.99.94:50966      172.20.99.94:6000       ESTABLISHED 154445/monitor_coll


root@3fs-node-meta01:/etc/clickhouse-server# ls
config.d  config.xml  users.d  users.xml
root@3fs-node-meta01:/etc/clickhouse-server# pwd
/etc/clickhouse-server
root@3fs-node-meta01:/etc/clickhouse-server# grep -rn 6000 *
config.xml:104:    <tcp_port>6000</tcp_port>
config.xml:721:                    <port>6000</port>
config.xml:732:                    <port>6000</port>
config.xml:736:                    <port>6000</port>
config.xml:740:                    <port>6000</port>
config.xml:747:                    <port>6000</port>
config.xml:751:                    <port>6000</port>
config.xml:755:                    <port>6000</port>
config.xml:763:                     <port>6000</port>
config.xml:769:                     <port>6000</port>
config.xml:777:                    <port>6000</port>
config.xml:783:                    <port>6000</port>
config.xml:792:                    <port>6000</port>
config.xml:799:                    <port>6000</port>
config.xml:816:                    <port>6000</port>

# 修改完端口后,重启clickhouse-server服务,之后查看状态和坚挺的端口,我这边已经修改完毕并重启了服务
root@3fs-node-meta01:/etc/clickhouse-server# systemctl status clickhouse-server.service
● clickhouse-server.service - ClickHouse Server (analytic DBMS for big data)
     Loaded: loaded (/lib/systemd/system/clickhouse-server.service; disabled; vendor preset: enabled)
     Active: active (running) since Sun 2025-03-16 15:01:54 CST; 3h 57min ago
   Main PID: 143832 (clckhouse-watch)
      Tasks: 249 (limit: 154032)
     Memory: 3.1G
        CPU: 2h 55min 31.779s
     CGroup: /system.slice/clickhouse-server.service
             ├─143832 clickhouse-watchdog "" "" "" "" "" "" "" --config=/etc/clickhouse-server/config.xml --pid-file=/run/clickhouse-ser>
             └─143833 /usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml --pid-file=/run/clickhouse-server/clickhouse>

Mar 16 15:01:54 3fs-node-meta01 systemd[1]: Started ClickHouse Server (analytic DBMS for big data).
Mar 16 15:01:54 3fs-node-meta01 clickhouse-server[143832]: Processing configuration file '/etc/clickhouse-server/config.xml'.
Mar 16 15:01:54 3fs-node-meta01 clickhouse-server[143832]: Logging trace to /var/log/clickhouse-server/clickhouse-server.log
Mar 16 15:01:54 3fs-node-meta01 clickhouse-server[143832]: Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
Mar 16 15:01:55 3fs-node-meta01 clickhouse-server[143833]: Processing configuration file '/etc/clickhouse-server/config.xml'.
Mar 16 15:01:55 3fs-node-meta01 clickhouse-server[143833]: Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs>
Mar 16 15:01:55 3fs-node-meta01 clickhouse-server[143833]: Processing configuration file '/etc/clickhouse-server/users.xml'.
Mar 16 15:01:55 3fs-node-meta01 clickhouse-server[143833]: Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs>
lines 1-19/19 (END)

  • 网络配置 :由于我的环境是存在管理口和25GB网络的,但是我看了3fs的源代码发现并不能识别出bond设备,所以导致在安装的时候一直提示网络问题。也尝试需改了对应的配置文件,指定了设备,但是依然不行
root@3fs-node-meta01:/opt/3fs/etc# cat monitor_collector_main.toml | grep filter_list -C 5

[server.base.groups.io_worker.transport_pool]
max_connections = 1

[server.base.groups.listener]
filter_list = ['eno4'] # 默认配置这里是空的,但是我尝试配置了bond0、bond1即filter_list = ['bond0']、filter_list = ['bond1']均无法识别
listen_port = 10000
listen_queue_depth = 4096
rdma_listen_ethernet = true
reuse_port = false

# 看了对应的错误日志,发现对应的代码并没有解析bond设备
[2025-03-14T22:18:34.450205338+08:00 monitor_collect:416868 Listener.cc:87 ERROR] No available address for listener with network type: TCP, filter list:
[2025-03-14T22:18:34.450235037+08:00 monitor_collect:416868 ServiceGroup.cc:26 ERROR] error: RPC::ListenFailed(2011)
[2025-03-14T22:18:34.450243611+08:00 monitor_collect:416868 Server.cc:27 ERROR] Setup group (MonitorCollector) failed: RPC::ListenFailed(2011)
[2025-03-14T22:18:34.450250294+08:00 monitor_collect:416868 Server.cc:31 ERROR] Server::setup failed: RPC::ListenFailed(2011)
[2025-03-14T22:18:34.450259443+08:00 monitor_collect:416868 OnePhaseApplication.h:101 FATAL] Setup server failed: RPC::ListenFailed

#这个是源代码

case Address::TCP:
  return nic.starts_with("en") || nic.starts_with("eth");
case Address::IPoIB:
  return nic.starts_with("ib");
case Address::RDMA:
  return nic.starts_with("en") || nic.starts_with("eth");

配置
[server.base.groups.listener]
filter_list = ["bond0", "bond1"]  # 或者 "ens4f0np0"
listen_port = 10000


或者修改重新编译:
vim /home/3fs/src/common/net/Listener.cc
static bool checkNicType(std::string_view nic, Address::Type type) {
  switch (type) {
    case Address::TCP:
      return nic.starts_with("en") || nic.starts_with("eth") || nic.starts_with("bond");
    case Address::IPoIB:
      return nic.starts_with("ib");
    case Address::RDMA:
      return nic.starts_with("en") || nic.starts_with("eth") || nic.starts_with("bond");
    case Address::LOCAL:
      return nic.starts_with("lo");
    default:
      return false;
  }
}


这里配置了bond依然异常,是getNetworkInterfaces()和checkNicType()不支持bond*设备吗?这点需要再确认下

2、对应的软件包安装

foundationDB: https://apple.github.io/foundationdb/administration.html

也可以容器化:
docker run -d --net=host --name fdb-server foundationdb/foundationdb:7.1.42

clickhouse: https://clickhouse.com/docs/install
rust: 直接执行:curl https://sh.rustup.rs -sSf | sh
这两个服务的安装比较简单,这里就不再详细说明,需要注意的事,如果foundationDB出现了问题,准备重装时,需要先使用dpkg -P foundadtiondb-server卸载

  • 注意: 这里要注意foundadtionDB的版本
FoundationDB
Ensure that the version of FoundationDB client matches the server version, or copy the corresponding version of libfdb_c.so to maintain compatibility.
Find the fdb.cluster file and libfdb_c.so at /etc/foundationdb/fdb.cluster, /usr/lib/libfdb_c.so on nodes with FoundationDB installed.

libfuse 3.16.1 or newer version
FoundationDB 7.1 or newer version
Rust toolchain: minimal 1.75.0, recommended 1.85.0 or newer version (latest stable version)

# 这里使用ustc的源安装rust 非常快
export RUSTUP_DIST_SERVER=https://mirrors.ustc.edu.cn/rust-static
export RUSTUP_UPDATE_ROOT=https://mirrors.ustc.edu.cn/rust-static/rustup
这个是对应的安装包
root@3fs-node-meta01:/home/3fs_sft# ls
clickhouse-client_22.6.2.12_all.deb           foundationdb-clients_7.3.35-1_amd64.deb  fuse-3.16.1.tar.gz
clickhouse-common-static_22.6.2.12_amd64.deb  foundationdb-server_7.3.35-1_amd64.deb
clickhouse-server_22.6.2.12_all.deb           fuse-3.16.1
root@3fs-node-meta01:/home/3fs_sft#

# 这点是安装对应的软件
1、	安装OFED
wget https://www.mellanox.com/downloads/DOCA/DOCA_v2.10.0/host/doca-host_2.10.0-093000-25.01-ubuntu2204_amd64.deb
dpkg -i doca-host_2.10.0-093000-25.01-ubuntu2204_amd64.deb
apt-get update
apt-get -y install doca-ofed
 
2、验证OFED
直接执行ibdev2netdev, 有如下输出表示安装成功(这里同时也需要硬件支持)
root@3fs-node-meta01:/home/3fs_sft# ibdev2netdev
mlx5_bond_0 port 1 ==> bond1 (Up)


# libfuse
wget https://github.com/libfuse/libfuse/releases/download/fuse-3.16.1/fuse-3.16.1.tar.gz
tar vzxf fuse-3.16.1.tar.gz
cd fuse-3.16.1/
mkdir build && cd build
apt install -y meson
meson setup ..
ninja && ninja install

3、编译

  • 物理安装:
# 依赖安装
# for Ubuntu 20.04.
apt install cmake libuv1-dev liblz4-dev liblzma-dev libdouble-conversion-dev libdwarf-dev libunwind-dev \
  libaio-dev libgflags-dev libgoogle-glog-dev libgtest-dev libgmock-dev clang-format-14 clang-14 clang-tidy-14 lld-14 \
  libgoogle-perftools-dev google-perftools libssl-dev libclang-rt-14-dev gcc-10 g++-10 libboost1.71-all-dev

# for Ubuntu 22.04.
apt install cmake libuv1-dev liblz4-dev liblzma-dev libdouble-conversion-dev libdwarf-dev libunwind-dev \
  libaio-dev libgflags-dev libgoogle-glog-dev libgtest-dev libgmock-dev clang-format-14 clang-14 clang-tidy-14 lld-14 \
  libgoogle-perftools-dev google-perftools libssl-dev gcc-12 g++-12 libboost-all-dev

# for openEuler 2403sp1
yum install cmake libuv-devel lz4-devel xz-devel double-conversion-devel libdwarf-devel libunwind-devel \
    libaio-devel gflags-devel glog-devel gtest-devel gmock-devel clang-tools-extra clang lld \
    gperftools-devel gperftools openssl-devel gcc gcc-c++ boost-devel



Build 3FS in build folder:

cmake -S . -B build -DCMAKE_CXX_COMPILER=clang++-14 -DCMAKE_C_COMPILER=clang-14 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
cmake --build build -j 32

  • 容器化安装

root@3fs-meta:/home/3fs# cat Dockerfile
# Author: mmwei3
# Email: mmwei3@iflytek.com
# date: 20250313
# -------------------------------------------
# Stage 1: Build 3FS from source
# -------------------------------------------
FROM ubuntu:22.04 AS builder

# 时区非交互安装
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
    git cmake make clang-14 clang++-14 libfuse3-dev libssl-dev pkg-config \
    curl ca-certificates wget unzip \
    # FoundationDB client library headers (可选, 也可自行下载)
    && apt-get clean && rm -rf /var/lib/apt/lists/*

# 安装 Rust 工具链 (如果 3FS 需要 Rust >=1.68)
RUN export RUSTUP_DIST_SERVER=https://mirrors.ustc.edu.cn/rust-static
RUN export RUSTUP_UPDATE_ROOT=https://mirrors.ustc.edu.cn/rust-static/rustup
RUN curl --proto '=https' --tlsv1.2  | sh -s -- -y
ENV PATH="/root/.cargo/bin:${PATH}"

WORKDIR /build

# 克隆 3FS 源码
#RUN 3fs /build

# -------------------------------------------
# Stage 2: Create minimal runtime image
# -------------------------------------------
FROM ubuntu:22.04

ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
    libfuse3-3 libssl3 ca-certificates \
    cmake libuv1-dev liblz4-dev liblzma-dev libdouble-conversion-dev libdwarf-dev libunwind-dev \
    libaio-dev libgflags-dev libgoogle-glog-dev libgtest-dev libgmock-dev clang-format-14 clang-14 clang-tidy-14 lld-14 \
    libgoogle-perftools-dev google-perftools libssl-dev gcc-12 g++-12 libboost-all-dev \
    cmake make clang-14 clang++-14 libfuse3-dev libssl-dev pkg-config \
    curl ca-certificates wget unzip \
    # FoundationDB 客户端库 (fdb.cluster 需自行挂载)
    && apt-get clean && rm -rf /var/lib/apt/lists/*

# 创建目录结构
RUN mkdir -p /opt/3fs/bin /opt/3fs/etc /var/log/3fs

# 从 builder 复制 3FS 可执行文件
#COPY --from=builder /build/build/bin/* /opt/3fs/bin/
COPY bin/* /opt/3fs/bin/


# entrypoint 脚本,用来根据参数启动不同角色
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

WORKDIR /opt/3fs
ENTRYPOINT ["/entrypoint.sh"]
CMD ["help"]


root@3fs-meta:/home/3fs# docker images
REPOSITORY                         TAG          IMAGE ID       CREATED      SIZE
mmwei3.images.build/deepseek-3fs   20250314-5   80be8c8bda15   2 days ago   4.12GB

 docker run -d -it --net=host --name 3fs-mgmtd --privileged   -v /mycluster/config:/opt/3fs/etc   -v /mycluster/logs:/var/log/3fs  --device=/dev/infiniband:/dev/infiniband  -v /usr/lib:/usr/lib   --cap-add=NET_RAW --cap-add=IPC_LOCK --cap-add=CAP_NET_ADMIN  mmwei3.images.build/deepseek-3fs:20250314-6 mgmtd
# 对应的/entrypoint.sh
root@3fs-meta:/home/3fs# cat entrypoint.sh
#!/usr/bin/env bash
# Author: mmwei3
# Email: mmwei3@iflytek.com
# Date: 2025-03-13

set -e

ROLE="$1"
CFG_DIR="/opt/3fs/etc"
BIN_DIR="/opt/3fs/bin"
LOG_DIR="/var/log/3fs"

# 确保日志目录存在
mkdir -p "$LOG_DIR"

# 如果未提供角色参数,则显示帮助信息
if [[ -z "$ROLE" || "$ROLE" == "help" ]]; then
    echo "Usage: docker run <options> mmwei3.images.build/deepseek-3fs:20250313 <role>"
    echo "Available roles:"
    echo "  mgmtd     - Start management daemon"
    echo "  meta      - Start metadata service"
    echo "  storage   - Start storage service"
    echo "  client    - Start FUSE client"
    echo "  admin-cli - Start interactive admin CLI"
    echo "Example:"
    echo "  docker run --rm deepseek-3fs:latest mgmtd"
    exit 1
fi

# 角色处理
case "$ROLE" in
  mgmtd)
    exec "$BIN_DIR/mgmtd_main" -cfg "$CFG_DIR/mgmtd_main.toml"
    ;;
  meta)
    exec "$BIN_DIR/meta_main" -cfg "$CFG_DIR/meta_main.toml"
    ;;
  storage)
    exec "$BIN_DIR/storage_main" -cfg "$CFG_DIR/storage_main.toml"
    ;;
  client)
    exec "$BIN_DIR/hf3fs_fuse_main" -cfg "$CFG_DIR/hf3fs_fuse_main.toml"
    ;;
  admin-cli)
    exec "$BIN_DIR/admin_cli" -cfg "$CFG_DIR/admin_cli.toml"
    ;;
  *)
    echo "Error: Unknown role '$ROLE'"
    echo "Run 'docker run ... deepseek-3fs:latest help' for usage instructions."
    exit 1
    ;;
esac

4、部署

在meta节点操作:

由于我这里之前做了bond,所以需要先虚拟一个地址
ip link add eno3 type dummy
ip addr add 11.12.63.54/24 dev eno3
ip link set eno3 up

4.1 部署monitor_collector_main

Install monitor_collector service on the meta node.

  1. Copy monitor_collector_main to /opt/3fs/bin and config files to /opt/3fs/etc, and create log directory /var/log/3fs.
    mkdir -p /opt/3fs/{bin,etc}
    mkdir -p /var/log/3fs
    cp /root/3fs/build/bin/monitor_collector_main /opt/3fs/bin
    cp /root/3fs/configs/monitor_collector_main.toml /opt/3fs/etc
    
  2. Update monitor_collector_main.toml to add a ClickHouse connection:
    [server.monitor_collector.reporter]
    type = 'clickhouse'
    
    [server.monitor_collector.reporter.clickhouse]
    db = '3fs'
    host = '172.20.99.94'
    passwd = '7cmwyBmw'
    port = '6000'
    user = 'default'
    
  3. Start monitor service:
    cp /root/3fs/deploy/systemd/monitor_collector_main.service /usr/lib/systemd/system
    systemctl start monitor_collector_main
    

Note that

  • Multiple instances of monitor services can be deployed behind a virtual IP address to share the traffic.
  • Other services communicate with the monitor service over a TCP connection.
root@3fs-node-meta01:/opt/3fs/etc# clickhouse-client -n < /home/3fs/deploy/sql/3fs-monitor.sql
root@3fs-node-meta01:/opt/3fs/etc# cp /home/3fs/build/bin/monitor_collector_main /opt/3fs/bin
root@3fs-node-meta01:/opt/3fs/etc# cp /home/3fs/configs/monitor_collector_main.toml /opt/3fs/etc
root@3fs-node-meta01:/opt/3fs/etc# cat monitor_collector_main.toml
[common]
cluster_id = 'stage'

[common.ib_devices]
allow_unknown_zone = true
default_network_zone = 'UNKNOWN'
device_filter = []
subnets = []

[[common.log.categories]]
categories = [ '.' ]
handlers = [ 'normal', 'err', 'fatal' ]
inherit = true
level = 'INFO'
propagate = 'NONE'

[[common.log.handlers]]
async = true
file_path = '/var/log/3fs/monitor_collector_main.log'
max_file_size = '100MB'
max_files = 10
name = 'normal'
rotate = true
rotate_on_open = false
start_level = 'NONE'
stream_type = 'STDERR'
writer_type = 'FILE'

[[common.log.handlers]]
async = false
file_path = '/var/log/3fs/monitor_collector_main-err.log'
max_file_size = '100MB'
max_files = 10
name = 'err'
rotate = true
rotate_on_open = false
start_level = 'ERR'
stream_type = 'STDERR'
writer_type = 'FILE'

[[common.log.handlers]]
async = false
file_path = '/var/log/3fs/monitor_collector_main-fatal.log'
max_file_size = '100MB'
max_files = 10
name = 'fatal'
rotate = true
rotate_on_open = false
start_level = 'FATAL'
stream_type = 'STDERR'
writer_type = 'STREAM'

[server.base.independent_thread_pool]
bg_thread_pool_stratetry = 'SHARED_QUEUE'
collect_stats = false
enable_work_stealing = false
io_thread_pool_stratetry = 'SHARED_QUEUE'
num_bg_threads = 2
num_connect_threads = 2
num_io_threads = 2
num_proc_threads = 2
proc_thread_pool_stratetry = 'SHARED_QUEUE'

[server.base.thread_pool]
bg_thread_pool_stratetry = 'SHARED_QUEUE'
collect_stats = false
enable_work_stealing = false
io_thread_pool_stratetry = 'SHARED_QUEUE'
num_bg_threads = 2
num_connect_threads = 2
num_io_threads = 2
num_proc_threads = 2
proc_thread_pool_stratetry = 'SHARED_QUEUE'

[[server.base.groups]]
#default_timeout = '1s'
#drop_connections_interval = '1h'
network_type = 'TCP'
services = [ 'MonitorCollector' ]
use_independent_thread_pool = false

[server.base.groups.io_worker]
num_event_loop = 1
rdma_connect_timeout = '5s'
read_write_rdma_in_event_thread = false
read_write_tcp_in_event_thread = false
tcp_connect_timeout = '1s'
wait_to_retry_send = '100ms'

[server.base.groups.io_worker.ibsocket]
buf_ack_batch = 8
buf_signal_batch = 8
buf_size = 16384
drop_connections = 0
event_ack_batch = 128
#gid_index = 0
max_rd_atomic = 16
max_rdma_wr = 128
max_rdma_wr_per_post = 32
max_sge = 16
min_rnr_timer = 1
pkey_index = 0
record_bytes_per_peer = false
record_latency_per_peer = false
retry_cnt = 7
rnr_retry = 0
send_buf_cnt = 32
sl = 0
start_psn = 0
timeout = 14
traffic_class = 0

[server.base.groups.io_worker.transport_pool]
max_connections = 1

[server.base.groups.listener]
filter_list = ['eno4']
listen_port = 10000
listen_queue_depth = 4096
rdma_listen_ethernet = true
reuse_port = false

[server.base.groups.processor]
enable_coroutines_pool = true
max_coroutines_num = 256
max_processing_requests_num = 4096

[server.monitor_collector]
batch_commit_size = 4096
conn_threads = 32
queue_capacity = 204800

[server.monitor_collector.reporter]
type = 'clickhouse'

[server.monitor_collector.reporter.clickhouse]
db = '3fs'
host = '172.20.99.94'
passwd = 'thinkbig1'
port = '6000'
user = 'default'


root@3fs-node-meta01:/opt/3fs/etc# systemctl status monitor_collector_main.service
● monitor_collector_main.service - monitor_collector_main Server
     Loaded: loaded (/lib/systemd/system/monitor_collector_main.service; disabled; vendor preset: enabled)
     Active: active (running) since Sun 2025-03-16 17:00:20 CST; 2h 34min ago
   Main PID: 154445 (monitor_collect)
      Tasks: 59 (limit: 154032)
     Memory: 291.8M
        CPU: 11.241s
     CGroup: /system.slice/monitor_collector_main.service
             └─154445 /opt/3fs/bin/monitor_collector_main --cfg /opt/3fs/etc/monitor_collector_main.toml

Mar 16 17:00:20 3fs-node-meta01 monitor_collector_main[154445]: [2025-03-16T17:00:20.690640766+08:00 monitor_collect:154445 LogConfig.cc>
Mar 16 17:00:20 3fs-node-meta01 monitor_collector_main[154445]: [2025-03-16T17:00:20.690640766+08:00 monitor_collect:154445 LogConfig.cc>
Mar 16 17:00:20 3fs-node-meta01 monitor_collector_main[154445]: [2025-03-16T17:00:20.690640766+08:00 monitor_collect:154445 LogConfig.cc>
Mar 16 17:00:20 3fs-node-meta01 monitor_collector_main[154445]: [2025-03-16T17:00:20.690640766+08:00 monitor_collect:154445 LogConfig.cc>
Mar 16 17:00:20 3fs-node-meta01 monitor_collector_main[154445]: [2025-03-16T17:00:20.690640766+08:00 monitor_collect:154445 LogConfig.cc>
Mar 16 17:00:20 3fs-node-meta01 monitor_collector_main[154445]: [2025-03-16T17:00:20.690640766+08:00 monitor_collect:154445 LogConfig.cc>
Mar 16 17:00:20 3fs-node-meta01 monitor_collector_main[154445]: [2025-03-16T17:00:20.690640766+08:00 monitor_collect:154445 LogConfig.cc>
Mar 16 17:00:20 3fs-node-meta01 monitor_collector_main[154445]: [2025-03-16T17:00:20.690640766+08:00 monitor_collect:154445 LogConfig.cc>
Mar 16 17:00:20 3fs-node-meta01 monitor_collector_main[154445]: [2025-03-16T17:00:20.690640766+08:00 monitor_collect:154445 LogConfig.cc>
Mar 16 17:00:20 3fs-node-meta01 monitor_collector_main[154445]: [2025-03-16T17:00:20.690669262+08:00 monitor_collect:154445 OnePhaseAppl>
lines 1-20/20 (END)


Step 2: Admin client

Install admin_cli on all 所有节点都要安装

  1. Copy admin_cli to /opt/3fs/bin and config files to /opt/3fs/etc.
    mkdir -p /opt/3fs/{bin,etc}
    rsync -avz meta:~/3fs/build/bin/admin_cli /opt/3fs/bin
    rsync -avz meta:~/3fs/configs/admin_cli.toml /opt/3fs/etc
    rsync -avz meta:/etc/foundationdb/fdb.cluster /opt/3fs/etc
    
    # 单点
    cp /root/3fs/build/bin/admin_cli /opt/3fs/bin
    cp /root/3fs/configs/admin_cli.toml /opt/3fs/etc
    cp /etc/foundationdb/fdb.cluster /opt/3fs/etc
    
  2. Update admin_cli.toml to set cluster_id and clusterFile:
    cluster_id = "stage"
    
    [fdb]
    clusterFile = '/opt/3fs/etc/fdb.cluster'
    

The full help documentation for admin_cli can be displayed by running the following command:


root@3fs-node-meta01:/opt/3fs/etc# /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml help 
bench                          Usage: bench [--rank VAR] [--timeout VAR] [--coroutines VAR] [--seconds VAR] [--remove] path
cd                             Usage: cd [-L] [--inode] path
checksum                       Usage: checksum [--list] [--batch VAR] [--md5] [--fillZero] [--output VAR] path
create                         Usage: create [--perm VAR] [--chain-table-id VAR] [--chain-table-ver VAR] [--chain-list VAR] [--chunk-size VAR] [--stripe-size VAR] path
create-range                   Usage: create-range [--concurrency VAR] prefix inclusive_start exclusive_end
create-target                  Usage: create-target --node-id VAR --disk-index VAR --target-id VAR --chain-id VAR [--add-chunk-size] [--chunk-size VAR...] [--use-new-chunk-engine]
create-targets                 Usage: create-targets --node-id VAR [--disk-index VAR...] [--allow-existing-target] [--add-chunk-size] [--use-new-chunk-engine]
current-user                   Usage: current-user
decode-user-token              Usage: decode-user-token token
drop-user-cache                Usage: drop-user-cache [--uid VAR] [--all]


Step 3: Mgmtd service

Install mgmtd service on meta node.

  1. Copy mgmtd_main to /opt/3fs/bin and config files to /opt/3fs/etc.

    cp /root/3fs/build/bin/mgmtd_main /opt/3fs/bin
    cp /root/3fs/configs/{mgmtd_main.toml,mgmtd_main_launcher.toml,mgmtd_main_app.toml} /opt/3fs/etc
    
  2. Update config files:

    • Set mgmtd node_id = 1 in mgmtd_main_app.toml.
    • Edit mgmtd_main_launcher.toml to set the cluster_id and clusterFile:
    cluster_id = "stage"
    
    [fdb]
    clusterFile = '/opt/3fs/etc/fdb.cluster'
    
    • Set monitor address in mgmtd_main.toml:
    [common.monitor.reporters.monitor_collector]
    remote_ip = "192.168.1.1:10000" # 这里替换成自己的monitor节点的TCP地址 不能是RDMA的,否则失败
    
  3. Initialize the cluster:

    root@3fs-node-meta01:/opt/3fs/etc# /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml "init-cluster --mgmtd /opt/3fs/etc/mgmtd_main.toml 1 1048576 16"
    Init filesystem, root directory layout: chain table ChainTableId(1), chunksize 1048576, stripesize 16
    Init config for MGMTD version 1
    
    

    The parameters of admin_cli:

    • 1 the chain table ID
    • 1048576 the chunk size in bytes
    • 16 the file strip size

    Run help init-cluster for full documentation.

  4. Start mgmtd service:

    cp /root/3fs/deploy/systemd/mgmtd_main.service /usr/lib/systemd/system
    systemctl start mgmtd_main
    
    root@3fs-node-meta01:/opt/3fs/etc# systemctl status mgmtd_main.service
    ● mgmtd_main.service - mgmtd_main Server
      Loaded: loaded (/lib/systemd/system/mgmtd_main.service; disabled; vendor preset: enabled)
      Active: active (running) since Sun 2025-03-16 17:01:48 CST; 2h 34min ago
    Main PID: 154612 (mgmtd_main)
       Tasks: 37 (limit: 154032)
      Memory: 267.5M
         CPU: 37.587s
      CGroup: /system.slice/mgmtd_main.service
              └─154612 /opt/3fs/bin/mgmtd_main --launcher_cfg /opt/3fs/etc/mgmtd_main_launcher.toml --app-cfg /opt/3fs/etc/mgmtd_main_app>
              Mar 16 17:01:48 3fs-node-meta01 mgmtd_main[154612]: [2025-03-16T17:01:48.410318530+08:00 mgmtd_main:154612
              LogConfig.cc:96 INFO]     "fa>
    
  5. Run list-nodes command to check if the cluster has been successfully initialized:

    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "list-nodes"
    

If multiple instances of mgmtd services deployed, one of the mgmtd services is elected as the primary; others are secondaries. Automatic failover occurs when the primary fails.


Step 4: Meta service

Install meta service on meta node.

  1. Copy meta_main to /opt/3fs/bin and config files to /opt/3fs/etc.
    cp ~/3fs/build/bin/meta_main /opt/3fs/bin
    cp ~/3fs/configs/{meta_main_launcher.toml,meta_main.toml,meta_main_app.toml} /opt/3fs/etc
    
  2. Update config files:
    • Set meta node_id = 100 in meta_main_app.toml.
    • Set cluster_id, clusterFile and mgmtd address in meta_main_launcher.toml:
    cluster_id = "stage"
    
    [mgmtd_client]
    mgmtd_server_addresses = ["RDMA://192.168.1.1:8000"]
    
    • Set mgmtd and monitor addresses in meta_main.toml.
    [server.mgmtd_client]
    mgmtd_server_addresses = ["RDMA://192.168.1.1:8000"]
    
    [common.monitor.reporters.monitor_collector]
    remote_ip = "192.168.1.1:10000"
    
    [server.fdb]
    clusterFile = '/opt/3fs/etc/fdb.cluster'
    
  3. Config file of meta service is managed by mgmtd service. Use admin_cli to upload the config file to mgmtd:
    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "set-config --type META --file /opt/3fs/etc/meta_main.toml"
    
  4. Start meta service:
    cp ~/3fs/deploy/systemd/meta_main.service /usr/lib/systemd/system
    systemctl start meta_main
    
  5. Run list-nodes command to check if meta service has joined the cluster:
    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "list-nodes"
    

If multiple instances of meta services deployed, meta requests will be evenly distributed to all instances.


Step 5: Storage service

Install storage service on storage node.

  1. Format the attached 16 SSDs as XFS and mount at /storage/data{1..16}, then create data directories /storage/data{1..16}/3fs and log directory /var/log/3fs.
    mkdir -p /storage/data{1..16}
    mkdir -p /var/log/3fs
    for i in {1..16};do mkfs.xfs -L data${i} /dev/nvme${i}n1;mount -o noatime,nodiratime -L data${i} /storage/data${i};done
    mkdir -p /storage/data{1..16}/3fs
    
  2. Increase the max number of asynchronous aio requests:
    sysctl -w fs.aio-max-nr=67108864
    
  3. Copy storage_main to /opt/3fs/bin and config files to /opt/3fs/etc.
    rsync -avz meta:~/3fs/build/bin/storage_main /opt/3fs/bin
    rsync -avz meta:~/3fs/configs/{storage_main_launcher.toml,storage_main.toml,storage_main_app.toml} /opt/3fs/etc
    
  4. Update config files:
    • Set node_id in storage_main_app.toml. Each storage service is assigned a unique id between 10001 and 10005.
    • Set cluster_id and mgmtd address in storage_main_launcher.toml.
    cluster_id = "stage"
    
    [mgmtd_client]
    mgmtd_server_addresses = ["RDMA://192.168.1.1:8000"]
    
    • Add target paths in storage_main.toml:
    [server.mgmtd]
    mgmtd_server_address = ["RDMA://192.168.1.1:8000"]
    
    [common.monitor.reporters.monitor_collector]
    remote_ip = "192.168.1.1:10000"
    
    [server.targets]
    target_paths = ["/storage/data1/3fs","/storage/data2/3fs","/storage/data3/3fs","/storage/data4/3fs","/storage/data5/3fs","/storage/data6/3fs","/storage/data7/3fs","/storage/data8/3fs","/storage/data9/3fs","/storage/data10/3fs","/storage/data11/3fs","/storage/data12/3fs","/storage/data13/3fs","/storage/data14/3fs","/storage/data15/3fs","/storage/data16/3fs",]
    
  5. Config file of storage service is managed by mgmtd service. Use admin_cli to upload the config file to mgmtd:
    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "set-config --type STORAGE --file /opt/3fs/etc/storage_main.toml"
    
  6. Start storage service:
    rsync -avz meta:~/3fs/deploy/systemd/storage_main.service /usr/lib/systemd/system
    systemctl start storage_main
    
  7. Run list-nodes command to check if storage service has joined the cluster:
    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "list-nodes"
    

Step 6: Create admin user, storage targets and chain table
  1. Create an admin user:
    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "user-add --root --admin 0 root"
    
    The admin token is printed to the console, save it to /opt/3fs/etc/token.txt.
  2. Generate admin_cli commands to create storage targets on 5 storage nodes (16 SSD per node, 6 targets per SSD).
    • Follow instructions at here to install Python packages.
    pip install -r ~/3fs/deploy/data_placement/requirements.txt
    python ~/3fs/deploy/data_placement/src/model/data_placement.py \
       -ql -relax -type CR --num_nodes 5 --replication_factor 3 --min_targets_per_disk 6
    python ~/3fs/deploy/data_placement/src/setup/gen_chain_table.py \
       --chain_table_type CR --node_id_begin 10001 --node_id_end 10005 \
       --num_disks_per_node 16 --num_targets_per_disk 6 \
       --target_id_prefix 1 --chain_id_prefix 9 \
       --incidence_matrix_path output/DataPlacementModel-v_5-b_10-r_6-k_3-λ_2-lb_1-ub_1/incidence_matrix.pickle
    
    The following 3 files will be generated in output directory: create_target_cmd.txt, generated_chains.csv, and generated_chain_table.csv.
  3. Create storage targets:
    /opt/3fs/bin/admin_cli --cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' --config.user_info.token $(<"/opt/3fs/etc/token.txt") < output/create_target_cmd.txt
    
  4. Upload chains to mgmtd service:
    /opt/3fs/bin/admin_cli --cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' --config.user_info.token $(<"/opt/3fs/etc/token.txt") "upload-chains output/generated_chains.csv"
    
  5. Upload chain table to mgmtd service:
    /opt/3fs/bin/admin_cli --cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' --config.user_info.token $(<"/opt/3fs/etc/token.txt") "upload-chain-table --desc stage 1 output/generated_chain_table.csv"
    
  6. List chains and chain tables to check if they have been correctly uploaded:
    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "list-chains"
    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "list-chain-tables"
    

Step 7: FUSE client

For simplicity FUSE client is deployed on the meta node in this guide. However, we strongly advise against deploying clients on service nodes in production environment.

  1. Copy hf3fs_fuse_main to /opt/3fs/bin and config files to /opt/3fs/etc.
    cp ~/3fs/build/bin/hf3fs_fuse_main /opt/3fs/bin
    cp ~/3fs/configs/{hf3fs_fuse_main_launcher.toml,hf3fs_fuse_main.toml,hf3fs_fuse_main_app.toml} /opt/3fs/etc
    
  2. Create the mount point:
    mkdir -p /3fs/stage
    
  3. Set cluster ID, mountpoint, token file and mgmtd address in hf3fs_fuse_main_launcher.toml
    cluster_id = "stage"
    mountpoint = '/3fs/stage'
    token_file = '/opt/3fs/etc/token.txt'
    
    [mgmtd_client]
    mgmtd_server_addresses = ["RDMA://192.168.1.1:8000"]
    
  4. Set mgmtd and monitor address in hf3fs_fuse_main.toml.
    [mgmtd]
    mgmtd_server_addresses = ["RDMA://192.168.1.1:8000"]
    
    [common.monitor.reporters.monitor_collector]
    remote_ip = "192.168.1.1:10000"
    
  5. Config file of FUSE client is also managed by mgmtd service. Use admin_cli to upload the config file to mgmtd:
    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "set-config --type FUSE --file /opt/3fs/etc/hf3fs_fuse_main.toml"
    
  6. Start FUSE client:
    cp ~/3fs/deploy/systemd/hf3fs_fuse_main.service /usr/lib/systemd/system
    systemctl start hf3fs_fuse_main
    
  7. Check if 3FS has been mounted at /3fs/stage:
    mount | grep '/3fs/stage'
    
root@3fs-node-meta01:/opt/3fs/etc# df -Th
Filesystem     Type        Size  Used Avail Use% Mounted on
tmpfs          tmpfs        13G  2.6M   13G   1% /run
/dev/sdy3      ext4        437G   59G  356G  15% /
tmpfs          tmpfs        63G   16K   63G   1% /dev/shm
tmpfs          tmpfs       5.0M     0  5.0M   0% /run/lock
/dev/sdy2      ext4        974M  234M  673M  26% /boot
/dev/sdy1      vfat        1.1G  6.1M  1.1G   1% /boot/efi
tmpfs          tmpfs        13G  4.0K   13G   1% /run/user/0
hf3fs.stage    fuse.hf3fs   81T  611G   81T   1% /3fs/stage
root@3fs-node-meta01:/opt/3fs/etc# df -Th | grep ^C
root@3fs-node-meta01:/opt/3fs/etc# mount | grep 3fs
hf3fs.stage on /3fs/stage type fuse.hf3fs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=1048576)
root@3fs-node-meta01:/opt/3fs/etc#

5、遇到的问题

1、部分库文件需要拷贝过去
root@3fs-node-meta01:/opt/3fs/etc# cp /home/3fs/third_party/jemalloc/lib/libjemalloc.so.2 /usr/lib/
2、服务异常需要看对应的日志文件,日志文件很清晰

root@3fs-node-meta01:/opt/3fs/etc# /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://11.12.63.55:8000"]' "list-nodes"
Id     Type     Status               Hostname             Pid     Tags  LastHeartbeatTime    ConfigVersion  ReleaseVersion
1      MGMTD    PRIMARY_MGMTD        3fs-node-meta01      154612  []    N/A                  1(UPTODATE)    250228-dev-1-999999-c450ee0c
100    META     HEARTBEAT_CONNECTED  3fs-node-meta01      159886  []    2025-03-16 19:41:06  1(UPTODATE)    250228-dev-1-999999-c450ee0c
10001  STORAGE  HEARTBEAT_CONNECTED  3fs-node-storage001  928385  []    2025-03-16 19:41:15  3(UPTODATE)    250228-dev-1-999999-c450ee0c
10002  STORAGE  HEARTBEAT_CONNECTED  3fs-node-storage002  676814  []    2025-03-16 19:41:15  3(UPTODATE)    250228-dev-1-999999-c450ee0c
10003  STORAGE  HEARTBEAT_CONNECTED  3fs-node-storage003  676104  []    2025-03-16 19:41:16  3(UPTODATE)    250228-dev-1-999999-c450ee0c
root@3fs-node-meta01:/opt/3fs/etc#

3、涉及依赖
root@3fs-node-meta01:/opt/3fs/etc# cd deploy/data_placement
root@3fs-node-meta01:/opt/3fs/etc# pip install -r requirements.txt
4、服务端口问题,monitor的识别的是TCP的,如果使用RDMA,发现mgmtd会报错;网卡设备问题 无法识别bond设备
5、后面需要调研下扩缩容、换盘等操作逻辑
5、这里为了操作clickhouse方便,特地写了个小工具


root@3fs-node-meta01:/home/mmwei3# ls
click_tool  fdb_tool
root@3fs-node-meta01:/home/mmwei3# cd click_tool/
root@3fs-node-meta01:/home/mmwei3/click_tool# ls
bak_cli.py  build  clickhouse_tool.egg-info  clickhouse_tool.py  README.md  setup.py
root@3fs-node-meta01:/home/mmwei3/click_tool# cat clickhouse_tool.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
ClickHouse CLI Tool

Developer: mmwei3
Date: 2025-03-12
"""

from clickhouse_driver import Client
import logging
import argparse
import json

# Version information
TOOL_VERSION = "1.0"
DEVELOPER_INFO = "mmwei3 for 20250312"

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

class ClickHouseTool:
    def __init__(self, host='172.20.99.94', port=9000, user='default', password='thinkbig1', database='default'):
        """Initialize ClickHouse connection (default connects to 172.20.99.94)"""
        try:
            self.client = Client(host=host, port=port, user=user, password=password, database=database)
            logging.info(f"Successfully connected to ClickHouse server: {host}:{port}, database: {database}")
        except Exception as e:
            logging.error(f"ClickHouse connection failed: {e}")
            raise

    def execute_query(self, query, params=None):
        """Execute SQL query"""
        try:
            result = self.client.execute(query, params)
            return result
        except Exception as e:
            logging.error(f"SQL execution error: {e}")
            return None

    def insert_data(self, table, data):
        """Insert data"""
        if not data:
            logging.warning("Data is empty, insert operation skipped")
            return

        keys = data[0].keys()
        values = [tuple(item.values()) for item in data]

        query = f"INSERT INTO {table} ({', '.join(keys)}) VALUES"
        try:
            self.client.execute(query, values, types_check=True)
            logging.info("Data inserted successfully")
        except Exception as e:
            logging.error(f"Insertion failed: {e}")

    def select_data(self, query, params=None):
        """Select data"""
        result = self.execute_query(query, params)
        if result is not None:
            logging.info("Query successful")
            return result
        return []

    def update_data(self, query, params=None):
        """Update data"""
        self.execute_query(query, params)
        logging.info("Update successful")

    def delete_data(self, query, params=None):
        """Delete data"""
        self.execute_query(query, params)
        logging.info("Delete successful")

    def close(self):
        """Close connection"""
        self.client.disconnect()
        logging.info("ClickHouse connection closed")

# CLI Tool
def cli():
    parser = argparse.ArgumentParser(
        description="ClickHouse CRUD Tool (mmwei3 for 20250312)\n\n"
                    "Usage examples:\n"
                    "1. Insert data:\n"
                    "   clickhouse_tool --action insert --table my_table --data '[{\"id\": 1, \"name\": \"Alice\"}]'\n"
                    "2. Select data:\n"
                    "   clickhouse_tool --action select --query 'SELECT * FROM my_table'\n"
                    "3. Update data:\n"
                    "   clickhouse_tool --action update --query 'ALTER TABLE my_table UPDATE name=\"Charlie\" WHERE id=1'\n"
                    "4. Delete data:\n"
                    "   clickhouse_tool --action delete --query 'ALTER TABLE my_table DELETE WHERE id=1'\n"
    )

    parser.add_argument('--host', type=str, default='172.20.99.94', help="ClickHouse server address (default: 172.20.99.94)")
    parser.add_argument('--port', type=int, default=9000, help="ClickHouse port (default: 9000)")
    parser.add_argument('--user', type=str, default='default', help="Username (default: default)")
    parser.add_argument('--password', type=str, default='thinkbig1', help="Password (default: thinkbig1)")
    parser.add_argument('--database', type=str, default='default', help="Database name (default: default)")

    parser.add_argument('--action', type=str, choices=['insert', 'select', 'update', 'delete'], required=True,
                        help="Operation type: insert/select/update/delete")
    parser.add_argument('--table', type=str, help="Target table name (only for insert operation)")
    parser.add_argument('--query', type=str, help="SQL query (required for select/update/delete)")
    parser.add_argument('--data', type=str, help="Data to insert (in JSON format)")

    parser.add_argument('--version', action='version', version=f"ClickHouse Tool v{TOOL_VERSION} ({DEVELOPER_INFO})")

    args = parser.parse_args()

    tool = ClickHouseTool(args.host, args.port, args.user, args.password, args.database)

    if args.action == 'insert':
        if not args.table or not args.data:
            logging.error("Insert operation requires --table and --data")
        else:
            try:
                data = json.loads(args.data)
                if isinstance(data, dict):
                    data = [data]
                tool.insert_data(args.table, data)
            except json.JSONDecodeError:
                logging.error("Data format error, please use JSON format")
    elif args.action == 'select':
        if not args.query:
            logging.error("Select operation requires --query")
        else:
            result = tool.select_data(args.query)
            print("Query result:", result)
    elif args.action == 'update':
        if not args.query:
            logging.error("Update operation requires --query")
        else:
            tool.update_data(args.query)
    elif args.action == 'delete':
        if not args.query:
            logging.error("Delete operation requires --query")
        else:
            tool.delete_data(args.query)

    tool.close()

if __name__ == '__main__':
    cli()



6、部分命令解释

这两条命令分别执行数据放置模型 (data_placement.py) 和链表生成 (gen_chain_table.py) 相关的任务。下面是参数的详细解释:


6.1 第一条命令
python /home/3fs/deploy/data_placement/src/model/data_placement.py \
   -ql -relax -type CR --num_nodes 3 --replication_factor 3 --min_targets_per_disk 6

作用
执行数据放置 (data_placement.py),用于决定如何在存储集群中放置数据块。

参数解析
  • -ql
    • 可能是 “quick launch” 或者 “query log” 之类的选项(需要具体查看 data_placement.py 的代码)。
  • -relax
    • 可能用于 “relax constraints”(放松约束),用于调整优化模型,允许更多的自由度。
  • -type CR
    • 指定数据放置的类型为 CR(可能代表某种放置策略,比如 Chain Replication)。
  • --num_nodes 3
    • 指定集群中有 3 个存储节点(即数据会分布在 3 台机器上)。
  • --replication_factor 3
    • 副本因子 = 3,表示每份数据会存储 3 份副本。
  • --min_targets_per_disk 6
    • 每块磁盘至少要存放 6 个数据目标(可能是文件块、数据分片等)。

6.2 第二条命令
python /home/3fs/deploy/data_placement/src/setup/gen_chain_table.py \
   --chain_table_type CR --node_id_begin 10001 --node_id_end 10005 \
   --num_disks_per_node 16 --num_targets_per_disk 6 \
   --target_id_prefix 1 --chain_id_prefix 9 \
   --incidence_matrix_path output/DataPlacementModel-v_5-b_10-r_6-k_3-λ_2-lb_1-ub_1/incidence_matrix.pickle

作用
生成 链式存储(Chain Table) 的配置表,可能用于存储节点间的数据复制与路由。

参数解析
  • --chain_table_type CR
    • 链表类型为 CR(可能对应 Chain Replication)。
  • --node_id_begin 10001 --node_id_end 10005
    • 生成的链表节点 ID 从 10001 到 10005(表示 5 个存储节点)。
  • --num_disks_per_node 16
    • 每个存储节点有 16 块磁盘
  • --num_targets_per_disk 6
    • 每块磁盘最多存放 6 个数据目标(可能是分片或文件块)。
  • --target_id_prefix 1
    • 数据目标 ID 以 1 作为前缀(可能用于唯一标识目标)。
  • --chain_id_prefix 9
    • 生成的链 ID 以 9 作为前缀(可能用于标识不同的链组)。
  • --incidence_matrix_path output/DataPlacementModel-v_5-b_10-r_6-k_3-λ_2-lb_1-ub_1/incidence_matrix.pickle
    • 关联到一个 incidence_matrix.pickle 文件,这可能是数据放置模型的关联矩阵,用于表示数据块与磁盘之间的映射关系。

第一条命令 (data_placement.py)
  • 计算如何在 3 个存储节点存储数据块,保证 每个数据块有 3 个副本,并 每块磁盘至少存放 6 个数据块
  • -ql-relax 可能用于优化放置策略。
第二条命令 (gen_chain_table.py)
  • 生成 数据存储的链表结构,支持 5 个存储节点 (10001~10005),每个节点 有 16 块磁盘,每块磁盘 可存 6 个数据目标
  • 可能用于 存储节点间的数据复制、查询优化存储恢复策略

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2316889.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

专题|Python贝叶斯金融数据应用实例合集:随机波动率SV模型、逻辑回归、参数更新、绩效比较BEST分析亚马逊股票、普尔指数...

原文链接&#xff1a;https://tecdat.cn/?p41020 本专题合集系统梳理了贝叶斯方法在金融数据分析与分类建模中的前沿应用。合集聚焦于PyMC3概率编程框架&#xff0c;深度探讨了共轭先验参数更新、贝叶斯逻辑回归、贝叶斯夏普比率等核心算法在实际场景中的落地实践&#xff08;…

Linux调度器 --- 负载均衡的存在的问题

文章目录 前言一、简介二、Linux 调度器2.1 在单核系统上&#xff0c;CFS 非常简单2.2 在多核系统上&#xff0c;CFS 变得非常复杂2.2.1 负载均衡算法2.2.2 优化措施 三、Linux调度器负载均衡的存在的问题3.1 组负载不均衡问题&#xff08;Group Imbalance Bug&#xff09;3.2 …

从零开始用AI开发游戏(三)背景故事

《迷域回响》背景故事 第一章&#xff1a;失落的符文纪元 在远古的“艾瑟兰”大陆&#xff0c;掌握空间魔法的「筑界者文明」曾建造了连通万界的回响迷宫——这座迷宫既是试炼场&#xff0c;也是囚笼。文明巅峰时期&#xff0c;筑界者将禁忌知识刻入虚空符文&#xff0c;嵌于…

IXTUR气控永磁铁:以高精度气控和稳定磁场,为机器人应用提供稳定抓取力

在现代工业生产和物流领域&#xff0c;物料的抓取与搬运是影响生产效率和成本控制的重要环节。传统夹爪在面对不同材质、形状和重量的物体时&#xff0c;常常存在适应性差、抓取不稳定、操作复杂等问题&#xff0c;导致生产流程中频繁出现停机调整&#xff0c;增加了人工干预成…

硬件驱动——51单片机:寄存器、LED、动态数码管

目录 一、51单片机 1.寄存器 二、LED点灯 1.原理 2.封装函数 3.顺序点灯 4.特定位点灯 三、动态数码管 1.原理 2.封装函数 3.0~9跳变 4.顺序移位0~9跳变 一、51单片机 1.寄存器 51单片机共40个引脚&#xff0c;其中P0,P1,P2,P3是四个有8引脚的寄存器&#xff0…

2025 香港 Web3 嘉年华:全球 Web3 生态的年度盛会

自 2023 年首届香港 Web3 嘉年华成功举办以来&#xff0c;这一盛会已成为全球 Web3 领域规模最大、影响力最深远的行业活动之一。2025 年 4 月 6 日至 9 日&#xff0c;第三届香港 Web3 嘉年华将在香港盛大举行。本届活动由万向区块链实验室与 HashKey Group 联合主办、W3ME 承…

【MySQL】多表查询(笛卡尔积现象,联合查询、内连接、左外连接、右外连接、子查询)-通过练习快速掌握法

在DQL的基础查询中&#xff0c;我们已经学过了多表查询的一种&#xff1a;联合查询&#xff08;union&#xff09;。本文我们将系统的讲解多表查询。 笛卡尔积现象 首先&#xff0c;我们想要查询emp表和stu表两个表&#xff0c;按照我们之前的知识栈&#xff0c;我们直接使用…

Leetcode-132.Palindrome Partitioning II [C++][Java]

目录 一、题目描述 二、解题思路 【C】 【Java】 Leetcode-132.Palindrome Partitioning IIhttps://leetcode.com/problems/palindrome-partitioning-ii/description/132. 分割回文串 II - 力扣&#xff08;LeetCode&#xff09;132. 分割回文串 II - 给你一个字符串 s&…

在 macOS 上优化 Vim 用于开发

简介 这篇指南将带你通过一系列步骤&#xff0c;如何在 macOS 上优化 Vim&#xff0c;使其具备 代码补全、语法高亮、代码格式化、代码片段管理、目录树等功能。此外&#xff0c;我们还会解决在安装过程中可能遇到的常见错误。 1. 安装必备工具 在开始 Vim 配置之前&#xff…

SOME/IP-SD -- 协议英文原文讲解8

前言 SOME/IP协议越来越多的用于汽车电子行业中&#xff0c;关于协议详细完全的中文资料却没有&#xff0c;所以我将结合工作经验并对照英文原版协议做一系列的文章。基本分三大块&#xff1a; 1. SOME/IP协议讲解 2. SOME/IP-SD协议讲解 3. python/C举例调试讲解 5.1.4.4 S…

【Agent实战】货物上架位置推荐助手(RAG方式+结构化prompt(CoT)+API工具结合ChatGPT4o能力Agent项目实践)

本文原创作者:姚瑞南 AI-agent 大模型运营专家,先后任职于美团、猎聘等中大厂AI训练专家和智能运营专家岗;多年人工智能行业智能产品运营及大模型落地经验,拥有AI外呼方向国家专利与PMP项目管理证书。(转载需经授权) 目录 结论 效果图示 1.prompt 2. API工具封…

PyTorch 深度学习实战(11):强化学习与深度 Q 网络(DQN)

在之前的文章中&#xff0c;我们介绍了神经网络、卷积神经网络&#xff08;CNN&#xff09;、循环神经网络&#xff08;RNN&#xff09;、Transformer 等多种深度学习模型&#xff0c;并应用于图像分类、文本分类、时间序列预测等任务。本文将介绍强化学习的基本概念&#xff0…

Python学习第十九天

Django-分页 后端分页 Django提供了Paginator类来实现后端分页。Paginator类可以将一个查询集&#xff08;QuerySet&#xff09;分成多个页面&#xff0c;每个页面包含指定数量的对象。 from django.shortcuts import render, redirect, get_object_or_404 from .models impo…

Windows环境下安装部署dzzoffice+onlyoffice的私有网盘和在线协同系统

安装前需要准备好Docker Desktop环境&#xff0c;可查看我的另一份亲测安装文章 https://blog.csdn.net/qq_43003203/article/details/146283915?spm1001.2014.3001.5501 1、安装配置onlyoffice 1、Docker 拉取onlyoffice容器镜像 管理员身份运行Windows PowerShell&#x…

ChatPromptTemplate的使用

ChatPromptTemplate 是 LangChain 中专门用于管理多角色对话结构的提示词模板工具。它的核心价值在于&#xff0c;开发者可以预先定义不同类型的对话角色消息&#xff08;如系统指令、用户提问、AI历史回复&#xff09;&#xff0c;并通过数据绑定动态生成完整对话上下文。 1.…

Blender插件NodeWrangler导入贴图报错解决方法

Blender用NodeWrangler插件 CtrlShiftT 导入贴图 直接报错 解决方法: 用CtrlshiftT打开需要导入的材质文件夹时&#xff0c;右边有一个默认勾选的相对路径&#xff0c;取消勾选就可以了。 开启node wrangler插件&#xff0c;然后在导入贴图是取消勾选"相对路径"&am…

java项目之基于ssm的药店药品信息管理系统(源码+文档)

项目简介 药店药品信息管理系统实现了以下功能&#xff1a; 个人信息管理 负责管理个人用户的信息。 员工管理 负责管理药店或药品管理机构的员工信息。 药品管理 负责管理药品的详细信息&#xff0c;可能包括药品名称、成分、剂量、价格、库存等。 进货管理 负责管理药品…

论文分享 | HE-Nav: 一种适用于复杂环境中空地机器人的高性能高效导航系统

阿木实验室始终致力于通过开源项目和智能无人机产品&#xff0c;为全球无人机开发者提供强有力的技术支持&#xff0c;并推出了开源项目校园赞助活动&#xff0c;助力高校学子在学术研究与技术创新中取得更大突破。近日&#xff0c;香港大学王俊铭同学&#xff0c;基于阿木实验…

ubuntu 24 安装 python3.x 教程

目录 注意事项 一、安装不同 Python 版本 1. 安装依赖 2. 下载 Python 源码 3. 解压并编译安装 二、管理多个 Python 版本 1. 查看已安装的 Python 版本 2. 配置环境变量 3. 使用 update-alternatives​ 管理 Python 版本 三、使用虚拟环境为项目指定特定 Python 版本…

【sql靶场】第13、14、17关-post提交报错注入保姆级教程

目录 【sql靶场】第13、14、17关-post提交报错注入保姆级教程 1.知识回顾 1.报错注入深解 2.报错注入格式 3.使用的函数 4.URL 5.核心组成部分 6.数据编码规范 7.请求方法 2.第十三关 1.测试闭合 2.列数测试 3.测试回显 4.爆出数据库名 5.爆出表名 6.爆出字段 …