環境介紹
controller
autonomys-node
proof-server
nats-server
controller
autonomys-node
proof-server
nats-server
controller
autonomys-node
proof-server
nats-server
autonomys-plot-server-0
autonomys-plot-server-1
autonomys-plot-server-2
autonomys-plot-server-3
sharded-cache
full-piece-cache
autonomys-plot-server-0
autonomys-plot-server-1
autonomys-plot-server-2
autonomys-plot-server-3
sharded-cache
full-piece-cache
8T NVMe SSD * 4
/mnt/nvme0n1
/mnt/nvme0n2
/mnt/nvme1n2
/mnt/nvme1n1
8T NVMe SSD * 4
/mnt/nvme0n1
/mnt/nvme0n2
/mnt/nvme1n1
/mnt/nvme1n2
集群啟動命令
首先啟動 NATS,然後按照以下教學配置 Supervisor 的參數。配置完成後,只需執行以下指令即可啟動所有程序:
supervisorctl start all
Supervisor 配置
節點機配置
單台節點機需要部署4個組件:controller
autonomys-node
proof-server
nats-server
部署順序: nats-server
-> autonomys-node
-> controller
-> proof-server
nats-server
以下是 nats-server 配置示例,供參考:
server_name=n1-cluster
max_payload = 3MB
jetstream {
store_dir=/var/nats-data
}
cluster {
name: c1-cluster
listen: 0.0.0.0:4248
routes: [
nats://192.168.0.1:4248
nats://192.168.0.2:4248
]
}
autonomys-controller
# autonomys-controller 配置
# /etc/supervisor/conf.d/autonomys-controller.conf
[program:autonomys-controller]
command=/root/autonomys/autonomys-farmer cluster --nats-server nats://192.168.1.1:4222 --nats-server nats://192.168.1.2:4222 --nats-server nats://192.168.1.2:4222 controller --tmp --node-rpc-url ws://10.30.1.2:9944
autorestart=true
user=root
redirect_stderr=true
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=2
stdout_logfile=/var/log/autonomys-controller.log
autonomys-node
# autonomys-node 配置
# /etc/supervisor/conf.d/autonomys-node.conf
[program:autonomys-node]
command=/root/autonomys/autonomys-node run --base-path /var/autonomys-node --farmer --rpc-listen-on 0.0.0.0:9944 --chain mainnet --sync full --rpc-methods unsafe --rpc-cors all
autorestart=true
user=root
redirect_stderr=true
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=2
stdout_logfile=/var/log/autonomys-node.log
autonomys-proof-server
# autonomys-proof-server 配置
# /etc/supervisor/conf.d/autonomys-proof-server.conf
[program:autonomys-proof-server]
command=/root/autonomys/autonomys-farmer cluster --nats-server nats://192.168.1.1:4222 --nats-server nats://192.168.1.2:4222 --nats-server nats://192.168.1.2:4222 proof-server
autorestart=true
user=root
environment=CUDA_VISIBLE_DEVICES=0
redirect_stderr=true
stdout_logfile_maxbytes=500MB
stdout_logfile_backups=2
stdout_logfile=/var/log/autonomys-proof-server.log
啟動命令參數及環境變量解釋:
--nats-server
參數用於指定 nats 服務器地址
CUDA_VISIBLE_DEVICES
環境變量用於指定 GPU,0 表示 GPU0,1 表示GPU1,以此類推
P 盤機配置 (以 4 GPU為例)
單台P 盤機需要部署3個組件: autonomys-plot-server
,autonomys-sharded-cache
,autonomys-full-piece-cache
autonomys-plot-server
組件從 autonomys-sharded-cache
和 autonomys-full-piece-cache
組件獲取 piece 用於 p 盤
autonomys-sharded-cache
# sharded-cache 配置
# /etc/supervisor/conf.d/autonomys-sharded-cache.conf
[program:autonomys-sharded-cache]
command=/root/autonomys/autonomys-farmer cluster --nats-server nats://192.168.1.1:4222 --nats-server nats://192.168.1.2:4222 --nats-server nats://192.168.1.2:4222 sharded-cache path=/var/autonomys-sharded-cache
autorestart=true
user=root
redirect_stderr=true
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=2
stdout_logfile=/var/log/autonomys-sharded-cache.log
啟動命令參數解釋:
--nats-server
參數用於指定 nats 服務器地址
path=/path/to/autonomys-sharded-cache
參數用於指定 piece 緩存存儲路徑
autonomys-full-piece
# autonomys-full-piece 配置
# /etc/supervisor/conf.d/autonomys-full-piece.conf
[program:autonomys-full-piece]
command=/root/autonomys/autonomys-farmer cluster --nats-server nats://192.168.1.1:4222 --nats-server nats://192.168.1.2:4222 --nats-server nats://192.168.1.2:4222 full-piece-sharded-cache --tmp path=/var/autonomys-full-piece
autorestart=true
user=root
redirect_stderr=true
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=2
stdout_logfile=/var/log/autonomys-full-piece.log
啟動命令參數解釋:
--nats-server
參數用於指定 nats 服務器地址
path=/path/to/autonomys-full-piece
參數用於指定 full-piece 存儲路徑
autonomys-plot-server
# autonomys-plot-server 配置文件
# /etc/supervisor/conf.d/autonomys-plot-server.conf
[group:autonomys-plot-server]
programs=autonomys-plot-server-0,autonomys-plot-server-1,autonomys-plot-server-2,autonomys-plot-server-3
[program:autonomys-plot-server-0]
command=numactl -C 0-31 -l /root/autonomys/autonomys-farmer cluster --nats-server nats://192.168.1.1:4222 --nats-server nats://192.168.1.2:4222 --nats-server nats://192.168.1.2:4222 plot-server --priority-cache --listen-port 9966 /var/plot-server/base-path-0
autorestart=true
user=root
environment=CUDA_VISIBLE_DEVICES=0
redirect_stderr=true
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=2
stdout_logfile=/var/log/autonomys-plotter-0.log
[program:autonomys-plot-server-1]
command=numactl -C 96-127 -l /root/autonomys/autonomys-farmer cluster --nats-server nats://192.168.1.1:4222 --nats-server nats://192.168.1.2:4222 --nats-server nats://192.168.1.2:4222 plot-server --priority-cache --listen-port 9967 /var/plot-server/base-path-1
autorestart=true
user=root
environment=CUDA_VISIBLE_DEVICES=1
redirect_stderr=true
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=2
stdout_logfile=/var/log/autonomys-plotter-1.log
[program:autonomys-plot-server-2]
command=numactl -C 96-127 -l /root/autonomys/autonomys-farmer cluster --nats-server nats://192.168.1.1:4222 --nats-server nats://192.168.1.2:4222 --nats-server nats://192.168.1.2:4222 plot-server --priority-cache --listen-port 9968 /var/plot-server/base-path-2
autorestart=true
user=root
environment=CUDA_VISIBLE_DEVICES=2
redirect_stderr=true
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=2
stdout_logfile=/var/log/autonomys-plotter-2.log
[program:autonomys-plot-server-3]
command=numactl -C 144-175 -l /root/autonomys/autonomys-farmer cluster --nats-server nats://192.168.1.1:4222 --nats-server nats://192.168.1.2:4222 --nats-server nats://192.168.1.2:4222 plot-server --priority-cache --listen-port 9969 /var/plot-server/base-path-3
autorestart=true
user=root
environment=CUDA_VISIBLE_DEVICES=3
redirect_stderr=true
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=2
stdout_logfile=/var/log/autonomys-plotter-3.log
啟動命令參數及環境變量解釋:
--nats-server
參數用於指定 nats 服務器地址
CUDA_VISIBLE_DEVICES
環境變量用於指定 GPU,0 表示 GPU0,1 表示GPU1,以此類推
GPU_CONCURRENCY
增大此值會提高顯存使用量,在使用不同型號的 GPU 時,可以考慮適當調整該變量
需要注意的是, 使用 numactl 工具綁定 CPU 核心時,需考慮 GPU 的 numa 親和性,以達到最佳性能。
使用 nvidia-smi topo -m
命令可以查看 GPU numa 親和性
# nvidia-smi topo -m
GPU0 GPU1 NIC0 NIC1 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X SYS NODE NODE 0-47,96-143 0 N/A
GPU1 X SYS NODE NODE 0-47,96-143 0 N/A
GPU2 SYS X SYS SYS 48-95,144-191 1 N/A
GPU3 SYS X SYS SYS 48-95,144-191 1 N/A
NIC0 NODE SYS X PIX
NIC1 NODE SYS PIX X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
存儲機配置(以 4 盤為例)
autonomys-plot-client
# autonomys-plot-client 配置
# /etc/supervisor/conf.d/autonomys-plot-client.conf
[program:autonomys-plot-client]
command=/root/autonomys/autonomys-farmer cluster --nats-server nats://192.168.1.1:4222 --nats-server nats://192.168.1.2:4222 --nats-server nats://192.168.1.2:4222 plot-client --reward-address stBR..S8V path=/mnt/nvme0n1/,sectors=8000 path=/mnt/nvme0n2/,sectors=8000 path=/mnt/nvme1n0/,sectors=8000 path=/mnt/nvme1n1/,sectors=8000
autorestart=true
user=root
redirect_stderr=true
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=2
stdout_logfile=/var/log/autonomys-plot-client.log
啟動命令參數解釋:
--nats-server
參數用於指定 nats 服務器地址
path=/path/to/plot-dir,sectors=8000
參數用於指定 plot 的文件路徑以及 plot 的扇區數量
返回Oula