Note: The following names, IP addresses and other details are examples.
Environment Overview
Server
IP Address
Configuration
Component
Node 1
192.168.1.1
GPU * 1
controllerautonomys-node
proof-servernats-server
Node 2
192.168.1.2
GPU * 1
controllerautonomys-node
proof-servernats-server
Node 3
192.168.1.3
GPU * 1
controllerautonomys-node
proof-servernats-server
Plotter 1
192.168.1.4
GPU * 4
autonomys-plot-server-0
autonomys-plot-server-1
autonomys-plot-server-2
autonomys-plot-server-3
sharded-cachefull-piece-cache
Plotter 2
192.168.1.5
GPU * 4
autonomys-plot-server-0
autonomys-plot-server-1
autonomys-plot-server-2
autonomys-plot-server-3
sharded-cachefull-piece-cache
Storage 1
192.168.1.6
8T NVMe SSD * 4
/mnt/nvme0n1
/mnt/nvme0n2
/mnt/nvme1n2
/mnt/nvme1n1
autonomys-plot-client
Storage 2
192.168.1.7
8T NVMe SSD * 4
/mnt/nvme0n1
/mnt/nvme0n2
/mnt/nvme1n1
/mnt/nvme1n2
autonomys-plot-client
Cluster Start Command
Start by launching NATS, then follow the instructions below to configure Supervisor’s parameters. Once configured, simply run the following command to start all programs:
bashCopy codesupervisorctl start all
Supervisor Configuration
Node Configuration
Each node requires the deployment of 4 components: controllerautonomys-nodeproof-servernats-server
Explanation of Startup Command Parameters and Environment Variables:
--nats-server : This parameter is used to specify the address of the NATS server.
CUDA_VISIBLE_DEVICES: This environment variable is used to specify which GPU to use. For example, 0 represents GPU0, 1 represents GPU1, and so on.
Plotter Configuration (Example with 4 GPUs)
Each plotter requires the deployment of e components: autonomys-plot-server, autonomys-sharded-cacheand autonomys-full-piece-cache
The autonomys-plot-server component retrieves pieces from both the autonomys-sharded-cache and autonomys-full-piece-cache components for use on the plotting drive.
Explanation of Startup Command Parameters and Environment Variables:
--nats-server: Specifies the address of the NATS server.
CUDA_VISIBLE_DEVICES: Sets the GPU to be used, where 0 represents GPU0, 1 represents GPU1, and so forth.
GPU_CONCURRENCY: Increasing this value raises GPU memory usage. Adjusting this variable may be beneficial when using GPUs of different models.
It is important to note that when using the numactl tool to bind CPU cores, you should consider the NUMA affinity of the GPU to achieve optimal performance.
You can use the nvidia-smi topo -m command to check the NUMA affinity of the GPU.
# nvidia-smi topo -m
GPU0 GPU1 NIC0 NIC1 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X SYS NODE NODE 0-47,96-143 0 N/A
GPU1 X SYS NODE NODE 0-47,96-143 0 N/A
GPU2 SYS X SYS SYS 48-95,144-191 1 N/A
GPU3 SYS X SYS SYS 48-95,144-191 1 N/A
NIC0 NODE SYS X PIX
NIC1 NODE SYS PIX X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
--nats-server : Used to specify the address of the NATS server.
path=/path/to/plot-dir,sectors=8000: Specifies the file path for plots as well as the number of sectors for the plot, with 8000 as the sector count in this example.