😉Best Practices

Note: The following names, IP addresses and other details are examples.

Environment Overview

Server

IP Address

Configuration

Component

Node 1

192.168.1.1

GPU * 1

controller autonomys-node

proof-server nats-server

Node 2

192.168.1.2

GPU * 1

controller autonomys-node

proof-server nats-server

Node 3

192.168.1.3

GPU * 1

controller autonomys-node

proof-server nats-server

Plotter 1

192.168.1.4

GPU * 4

autonomys-plot-server-0

autonomys-plot-server-1

autonomys-plot-server-2

autonomys-plot-server-3

sharded-cache full-piece-cache

Plotter 2

192.168.1.5

GPU * 4

autonomys-plot-server-0

autonomys-plot-server-1

autonomys-plot-server-2

autonomys-plot-server-3

sharded-cache full-piece-cache

Storage 1

192.168.1.6

8T NVMe SSD * 4

/mnt/nvme0n1

/mnt/nvme0n2

/mnt/nvme1n2

/mnt/nvme1n1

autonomys-plot-client

Storage 2

192.168.1.7

8T NVMe SSD * 4

/mnt/nvme0n1

/mnt/nvme0n2

/mnt/nvme1n1

/mnt/nvme1n2

autonomys-plot-client

Cluster Start Command

Start by launching NATS, then follow the instructions below to configure Supervisor’s parameters. Once configured, simply run the following command to start all programs:

bashCopy codesupervisorctl start all

Supervisor Configuration

Node Configuration

Each node requires the deployment of 4 components: controller autonomys-node proof-server nats-server

Deployment sequence: nats-server -> autonomys-node -> controller -> proof-server

nats-server

This software requires the JetStream feature to be enabled in nats-server. To activate JetStream, simply start nats-server with the --jetstream flag.

For nats-server configuration, please refer to the official NATS documentation as well as the Autonomys NATS configuration documentation.

Here is an example configuration for nats-server for your reference.

server_name=n1-cluster
max_payload = 3MB

jetstream {
   store_dir=/var/nats-data
}


cluster {
  name: c1-cluster
  listen: 0.0.0.0:4248
  routes: [
    nats://192.168.0.1:4248
    nats://192.168.0.2:4248
  ]
}

autonomys-controller

# autonomys-controller Configuration
# /etc/supervisor/conf.d/autonomys-controller.conf

[program:autonomys-controller]
command=/root/autonomys/autonomys-farmer cluster --nats-server nats://192.168.1.1:4222 --nats-server nats://192.168.1.2:4222 --nats-server nats://192.168.1.2:4222 controller --tmp --node-rpc-url ws://10.30.1.2:9944
autorestart=true
user=root
redirect_stderr=true
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=2
stdout_logfile=/var/log/autonomys-controller.log

autonomys-node

# autonomys-node Configuration
# /etc/supervisor/conf.d/autonomys-node.conf

[program:autonomys-node]
command=/root/autonomys/autonomys-node run --base-path /var/autonomys-node --farmer --rpc-listen-on 0.0.0.0:9944 --chain mainnet --sync full --rpc-methods unsafe --rpc-cors all
autorestart=true
user=root
redirect_stderr=true
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=2
stdout_logfile=/var/log/autonomys-node.log

autonomys-proof-server

# autonomys-proof-server Configuration
# /etc/supervisor/conf.d/autonomys-proof-server.conf

[program:autonomys-proof-server]
command=/root/autonomys/autonomys-farmer cluster --nats-server nats://192.168.1.1:4222 --nats-server nats://192.168.1.2:4222 --nats-server nats://192.168.1.2:4222 proof-server
autorestart=true
user=root
environment=CUDA_VISIBLE_DEVICES=0
redirect_stderr=true
stdout_logfile_maxbytes=500MB
stdout_logfile_backups=2
stdout_logfile=/var/log/autonomys-proof-server.log

Explanation of Startup Command Parameters and Environment Variables:

--nats-server : This parameter is used to specify the address of the NATS server.
CUDA_VISIBLE_DEVICES: This environment variable is used to specify which GPU to use. For example, 0 represents GPU0, 1 represents GPU1, and so on.

Plotter Configuration (Example with 4 GPUs)

Each plotter requires the deployment of e components: autonomys-plot-server, autonomys-sharded-cacheand autonomys-full-piece-cache

The autonomys-plot-server component retrieves pieces from both the autonomys-sharded-cache and autonomys-full-piece-cache components for use on the plotting drive.

autonomys-sharded-cache

# sharded-cache Configuration
# /etc/supervisor/conf.d/autonomys-sharded-cache.conf

[program:autonomys-sharded-cache]
command=/root/autonomys/autonomys-farmer cluster --nats-server nats://192.168.1.1:4222 --nats-server nats://192.168.1.2:4222 --nats-server nats://192.168.1.2:4222 sharded-cache path=/var/autonomys-sharded-cache
autorestart=true
user=root
redirect_stderr=true
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=2
stdout_logfile=/var/log/autonomys-sharded-cache.log

Explanation of Startup Command Parameters:

--nats-server: Specifies the address of the NATS server.
path=/path/to/autonomys-sharded-cache: Specifies the storage path for the piece cache.

autonomys-full-piece

# autonomys-full-piece 配置
# /etc/supervisor/conf.d/autonomys-full-piece.conf

[program:autonomys-full-piece]
command=/root/autonomys/autonomys-farmer cluster --nats-server nats://192.168.1.1:4222 --nats-server nats://192.168.1.2:4222 --nats-server nats://192.168.1.2:4222 full-piece-sharded-cache --tmp path=/var/autonomys-full-piece
autorestart=true
user=root
redirect_stderr=true
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=2
stdout_logfile=/var/log/autonomys-full-piece.log

Explanation of Startup Command Parameters:

--nats-server : This parameter is used to specify the address of the NATS server.
path=/path/to/autonomys-full-piece: This parameter is used to specify the storage path for the piece cache.

autonomys-plot-server

# autonomys-plot-server 配置文件
# /etc/supervisor/conf.d/autonomys-plot-server.conf

[group:autonomys-plot-server]
programs=autonomys-plot-server-0,autonomys-plot-server-1,autonomys-plot-server-2,autonomys-plot-server-3
[program:autonomys-plot-server-0]
command=numactl -C 0-31 -l /root/autonomys/autonomys-farmer cluster --nats-server nats://192.168.1.1:4222 --nats-server nats://192.168.1.2:4222 --nats-server nats://192.168.1.2:4222 plot-server --priority-cache --listen-port 9966 /var/plot-server/base-path-0
autorestart=true
user=root
environment=CUDA_VISIBLE_DEVICES=0
redirect_stderr=true
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=2
stdout_logfile=/var/log/autonomys-plotter-0.log

[program:autonomys-plot-server-1]
command=numactl -C 96-127 -l /root/autonomys/autonomys-farmer cluster --nats-server nats://192.168.1.1:4222 --nats-server nats://192.168.1.2:4222 --nats-server nats://192.168.1.2:4222 plot-server --priority-cache --listen-port 9967 /var/plot-server/base-path-1
autorestart=true
user=root
environment=CUDA_VISIBLE_DEVICES=1
redirect_stderr=true
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=2
stdout_logfile=/var/log/autonomys-plotter-1.log

[program:autonomys-plot-server-2]
command=numactl -C 96-127 -l /root/autonomys/autonomys-farmer cluster --nats-server nats://192.168.1.1:4222 --nats-server nats://192.168.1.2:4222 --nats-server nats://192.168.1.2:4222 plot-server --priority-cache --listen-port 9968 /var/plot-server/base-path-2
autorestart=true
user=root
environment=CUDA_VISIBLE_DEVICES=2
redirect_stderr=true
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=2
stdout_logfile=/var/log/autonomys-plotter-2.log

[program:autonomys-plot-server-3]
command=numactl -C 144-175 -l /root/autonomys/autonomys-farmer cluster --nats-server nats://192.168.1.1:4222 --nats-server nats://192.168.1.2:4222 --nats-server nats://192.168.1.2:4222 plot-server --priority-cache --listen-port 9969 /var/plot-server/base-path-3
autorestart=true
user=root
environment=CUDA_VISIBLE_DEVICES=3
redirect_stderr=true
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=2
stdout_logfile=/var/log/autonomys-plotter-3.log

Explanation of Startup Command Parameters and Environment Variables:

--nats-server: Specifies the address of the NATS server.
CUDA_VISIBLE_DEVICES: Sets the GPU to be used, where 0 represents GPU0, 1 represents GPU1, and so forth.
GPU_CONCURRENCY: Increasing this value raises GPU memory usage. Adjusting this variable may be beneficial when using GPUs of different models.

It is important to note that when using the numactl tool to bind CPU cores, you should consider the NUMA affinity of the GPU to achieve optimal performance.

You can use the nvidia-smi topo -m command to check the NUMA affinity of the GPU.

# nvidia-smi topo -m
        GPU0    GPU1    NIC0    NIC1    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      SYS     NODE    NODE    0-47,96-143     0               N/A
GPU1     X      SYS     NODE    NODE    0-47,96-143     0               N/A
GPU2    SYS      X      SYS     SYS     48-95,144-191   1               N/A
GPU3    SYS      X      SYS     SYS     48-95,144-191   1               N/A
NIC0    NODE    SYS      X      PIX
NIC1    NODE    SYS     PIX      X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1

Storage Configuration (Example with 4 Drives)

autonomys-plot-client

# autonomys-plot-client Configuration
# /etc/supervisor/conf.d/autonomys-plot-client.conf

[program:autonomys-plot-client]
command=/root/autonomys/autonomys-farmer cluster --nats-server nats://192.168.1.1:4222 --nats-server nats://192.168.1.2:4222 --nats-server nats://192.168.1.2:4222 plot-client --reward-address stBR..S8V  path=/mnt/nvme0n1/,sectors=8000 path=/mnt/nvme0n2/,sectors=8000 path=/mnt/nvme1n0/,sectors=8000 path=/mnt/nvme1n1/,sectors=8000
autorestart=true
user=root
redirect_stderr=true
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=2
stdout_logfile=/var/log/autonomys-plot-client.log

Explanation of Startup Command Parameters:

--nats-server : Used to specify the address of the NATS server.
path=/path/to/plot-dir,sectors=8000: Specifies the file path for plots as well as the number of sectors for the plot, with 8000 as the sector count in this example.

Last updated 11 months ago