1
1
.. _coral2 :
2
2
3
- ==================================
4
- CORAL2: Native Flux on Cray Shasta
5
- ==================================
3
+ ===========================
4
+ CORAL2: Flux on Cray Shasta
5
+ ===========================
6
6
7
- The LLNL, LBNL, and ORNL next-generation systems like RZNevada , Perlmutter,
8
- El Capitan, and Frontier are in various stages of early access. They are
9
- similar in that they all use the HPE Cray Shasta platform, which requires
10
- a few additional components to integrate completely with Flux.
7
+ The LLNL, LBNL, and ORNL systems like Tioga , Perlmutter,
8
+ El Capitan, and Frontier are similar in that they all use the
9
+ HPE Cray Shasta platform, which requires
10
+ an additional component to integrate completely with Flux.
11
11
12
12
--------------
13
13
Things to Know
@@ -18,96 +18,84 @@ Things to Know
18
18
Attempting to run further multi-node jobs will cause the excess jobs
19
19
to fail. There is no limit on *submitted * multi-node jobs, and
20
20
single-node jobs do not count towards the limit.
21
- #. All nested Flux instances (e.g. instances created with ``flux batch ``,
22
- ``flux alloc ``, or ``flux submit ... flux start ``
23
- should meet one of the following criteria:
21
+ #. All Flux instances should meet one of the following criteria:
24
22
25
23
- Occupy a single node
26
24
- Have exclusive access to the nodes they are running on (e.g. they
27
25
do not share their resources with sibling instances).
28
26
29
27
Instances that do not meet one of the above criteria will not work properly.
30
28
29
+ By default Flux reserves ports 11000-11999 for itself. At any given
30
+ level of the Flux hierarchy, this can be changed by configuring Flux
31
+ to load the `cray_pals_port_distributor ` jobtap plugin with a different
32
+ range of ports, like so:
33
+
34
+ .. code-block :: toml
35
+
36
+ [job-manager]
37
+ plugins = [
38
+ { load = "cray_pals_port_distributor.so", conf = { port-min = 11000, port-max = 13000 } }
39
+ ]
40
+
31
41
------------------------
32
42
Building Flux for CORAL2
33
43
------------------------
34
44
35
45
The basic steps to building Flux for Cray Shasta systems are as follows:
36
46
37
- #. :ref: `Build flux-core and flux-sched manually < manual_installation >`
38
- with some prefix *P *.
47
+ #. :ref: `Build flux-core (version >= 0.49.0) and flux-sched manually
48
+ <manual_installation>` with some prefix *P *.
39
49
#. Build `flux-coral2 <https://github.com/flux-framework/flux-coral2 >`_
40
50
with the same prefix *P *.
41
- #. Create a Flux config file specifying that the ``cray_pals_port_distributor.so ``
42
- plugin should be loaded with some given port range (see below for an example).
43
- If you have other config files, put the new file in with the others.
44
- Before launching Flux, point Flux to the *directory * containing your config
45
- file(s) by setting the ``FLUX_CONF_DIR `` environment variable, or by passing
46
- ``-o"-c/path/to/config" `` to ``flux start ``.
47
- #. As an alternative to creating a config file and setting ``FLUX_CONF_DIR ``,
48
- you can, after starting Flux, execute ``flux jobtap load
49
- cray_pals_port_distributor.so port-min=$N port-max=$M `` for some *N * and *M *.
50
-
51
-
52
- If you see job failures with an error message like "no cray_pals_port_distribution
53
- event posted", check that you have the ``cray_pals_port_distributor.so `` plugin
54
- loaded by running ``flux jobtap list ``. If you don't see it in the list, retry
55
- step 3 or 4 above.
56
-
57
- A script to build Flux is below.
58
-
59
- .. code-block :: sh
60
-
61
- #! /bin/bash
62
-
63
- set -e
64
-
65
- PREFIX=$HOME /local # a good default, but modify as needed
66
- PORT_MIN=11000 # a good default, but modify as needed
67
- PORT_MAX=12000 # a good default, but modify as needed
68
-
69
- # Step 1: Build flux-core 0.29 or later
70
-
71
- wget https://github.com/flux-framework/flux-core/releases/download/v0.29.0/flux-core-0.29.0.tar.gz
72
-
73
- tar -xzvf flux-core-0.29.0.tar.gz && cd flux-core-0.29.0
74
-
75
- ./configure --prefix=$PREFIX && make -j && make install && cd ..
76
-
77
- # The `flux` executable will now be in ~/local/bin/flux but it needs some
78
- # additional flux-coral2 extensions
79
-
80
51
81
- # Step 2: Build flux-sched 0.18 or later (optional but recommended)
52
+ ------------------
53
+ Flux with Cray PMI
54
+ ------------------
82
55
83
- wget https://github.com/flux-framework/flux-sched/releases/download/v0.18.0/flux-sched-0.18.0.tar.gz
56
+ Applications linked to Cray MPICH will work natively with Flux
57
+ provided the Cray MPICH library uses the PMI2 protocol instead of
58
+ the homespun Cray PMI and libPALS. For Flux to support libPALS,
59
+ flux-coral2 must be built (see above) and Flux must be configured
60
+ to offer libPALS support. This is done by setting the "pmi" shell
61
+ option to include "cray-pals" on a per-job basis like so:
84
62
85
- tar -xzvf flux-sched-0.18.0.tar.gz && cd flux-sched-0.18.0
63
+ .. code-block :: console
86
64
87
- ./configure --prefix= $PREFIX && make -j && make install && cd ..
65
+ $ flux submit -n2 -opmi=cray-pals ./mpi_hello
88
66
67
+ or by configuring Flux to offer such support by default, by adding
68
+ the following lines to the shell's ``initrc.lua `` file:
89
69
90
- # Step 3: Build flux-coral2
70
+ .. code-block :: lua
91
71
92
- git clone https://github.com/flux-framework/flux-coral2.git && cd flux-coral2
72
+ if shell.options['pmi'] == nil then
73
+ shell.options['pmi'] = 'cray-pals,simple'
74
+ end
93
75
94
- ./autogen.sh && ./configure --prefix=$PREFIX && make -j && make install
95
76
96
- libtool --finish $PREFIX /lib/flux/job-manager/
97
- libtool --finish $PREFIX /lib/flux/shell/plugins/
98
- cd ..
77
+ The lines should come before any call to load plugins.
99
78
79
+ If Flux jobs that use Cray MPICH end up as a collection of singletons,
80
+ that is usually a sign that Cray MPICH is trying to use libPALS.
100
81
101
- # Step 4: add a config file to automatically load a flux-coral2 plugin
82
+ -----------------------------
83
+ Configuring Flux with Rabbits
84
+ -----------------------------
102
85
103
- mkdir -p $PREFIX /etc/flux/config
86
+ In order for a Flux system instance to be able to allocate
87
+ rabbit storage, the ``dws_jobtap.so `` plugin must be loaded.
88
+ The plugin can be loaded in a config file like so:
104
89
105
- echo " [job-manager]
106
- plugins = [
107
- { load = \" cray_pals_port_distributor.so\" , conf = { port-min = $PORT_MIN , port-max = $PORT_MAX } }
108
- ]
109
- " > $PREFIX /etc/flux/config/cray_pals_ports.toml
90
+ .. code-block ::
110
91
111
- echo " Done! Now set FLUX_CONF_DIR=$PREFIX /etc/flux/config
112
- in your environment and run with $PREFIX /bin/flux"
92
+ [job-manager]
93
+ plugins = [
94
+ { load = "dws-jobtap.so" }
95
+ ]
113
96
97
+ Also, the ``flux-coral2-dws `` systemd service must be started
98
+ on the same node as the rank 0 broker of the system instance
99
+ (i.e. the management node). The ``flux `` user must have
100
+ a kubeconfig file in its home directory granting it read
101
+ and write access.
0 commit comments