bladerunner/docs/intro.rst

103 lines
4.4 KiB
ReStructuredText
Raw Normal View History

Introduction
============
I'm working on building a cluster of Raspberry Pi CM4 compute blades. I plan on
using the cluster to learn a bunch of stuff, but I need to be able to provision
the blades automatically. This repo is my work in that area.
There are some assumptions made:
1. All the systems involved are Debian-ish systems, particularly Ubuntu. The
build system here will assume this. It may work on non-Ubuntu apt-based
systems. For non-Debian systems, I've also been working on including
container builds that may work.
2023-04-12 08:01:57 +00:00
2. The primary target for this setup is Ubuntu 22.04. This needs to be
validated still.
2023-04-12 17:26:43 +00:00
There are three physical types of systems:
- ``dev`` indicates DEV compute blades.
- ``tpm`` indicates TPM compute blades.
2023-04-12 17:26:43 +00:00
- ``adm`` indicates an admin system, which will likely be a CM4 carrier board
in non-blade form. I foresee two types of systems: ``adm`` runs administrative
services (such as a PXE boot server) and ``gw`` managing networking with the
outside world.
The `computeblade docs <https://docs.computeblade.com/>`_ has a description of
the different blade types.
Below is a diagram of the planned system.
.. graphviz ::
digraph cluster {
subgraph {
dev01;
dev02;
dev03;
dev04;
dev05;
tpm01;
tpm02;
tpm03;
tpm04;
tpm05;
}
"poe-switch" -> dev01 [dir=both];
"poe-switch" -> dev02 [dir=both];
"poe-switch" -> dev03 [dir=both];
"poe-switch" -> dev04 [dir=both];
"poe-switch" -> dev05 [dir=both];
"poe-switch" -> tpm01 [dir=both];
"poe-switch" -> tpm02 [dir=both];
"poe-switch" -> tpm03 [dir=both];
"poe-switch" -> tpm04 [dir=both];
"poe-switch" -> tpm05 [dir=both];
"poe-switch" -> gw [dir=both];
publicnet -> gw [dir=both];
}
Hardware
--------
2023-04-12 08:01:57 +00:00
The hardware isn't slated to arrive until September at the earliest. I am
leaning towards having the 1TB NVMe drives go with the AI modules, and use
the gateway system as the storage machine if needed.
+----------------------------+----------+----------------------------------------+
| Item | Quantity | Notes |
+----------------------------+----------+----------------------------------------+
| TPM blade | 5 | TPM 2.0 |
+----------------------------+----------+----------------------------------------+
| DEV blade | 6 | TPM 2.0, µSD, nRPIBOOT |
+----------------------------+----------+----------------------------------------+
| CM4 | 10 | 8GB RAM, no eMMC/WiFi/BT |
+----------------------------+----------+----------------------------------------+
| CM4 | 2 | 8 GB RAM, eMMC/WiFi/BT (gw, dev blade) |
+----------------------------+----------+----------------------------------------+
| SAMSUNG 970 EVO Plus 500GB | 4/7 | 2280 |
+----------------------------+----------+----------------------------------------+
| SAMSUNG 970 EVO Plus 1 TB | 2/4 | 2280 (1 allocated to gw) |
+----------------------------+----------+----------------------------------------+
| RTC module | 10 | DS3231 |
+----------------------------+----------+----------------------------------------+
| AI module | 3 | 2x Coral TPU |
+----------------------------+----------+----------------------------------------+
| CM4 carrier board | 1 | Dual-homed, NVMe slot, Zymbit 4i |
+----------------------------+----------+----------------------------------------+
| Netgear GS316PP | 1 | 16-port PoE+ (183W) |
+----------------------------+----------+----------------------------------------+
2023-04-12 17:28:44 +00:00
Logical roles
-------------
Aside from the physical hardware types, there are essentially ``cluster`` nodes
and ``meta`` nodes. The ``cluster`` nodes are the compute blades, and ``meta``
are any additional systems involved. Individual nodes might be tagged with roles
as well to indicate additional capabilities, such as ``ai`` for ``cluster`` nodes
with TPUs or ``storage`` for ``cluster`` nodes with additional (e.g. 1TB) storage.