You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: xdmod/README.md
+66-12Lines changed: 66 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,17 +1,22 @@
1
1
## Overview
2
2
3
-
In this part of the tutorial we are going to install and configure Open XDMoD.
3
+
**NOTE:**
4
+
Due to COVID and this tutorial being virtual and much shorter than anticipated; this part of the tutorial is going to be a bit more of an interactive demo. Some parts are going to be skipped over quicker than usual, however, our team is available in SLACK and the zoom chat to answer any questions that you may have.
5
+
6
+
In this part of the tutorial we are going to go over the installation and configuratoin of Open XDMoD.
4
7
The base component of Open XDMoD uses the job accounting logs from the HPC
5
-
resource manager as the data source. We are also going to install the optional Job Performance Module. This
6
-
allows Open XDMoD to also display performance data for HPC jobs.
8
+
resource manager as the data source. We have also installed the optional Job Performance Module. This allows Open XDMoD to also display performance data for HPC jobs.
7
9
8
10
The asciinema media is not meant to be used on its own, they are intended for use in a "live" demonstration.
9
-
Command Line Demos in a Light color, are meant to be watched. Dark theme are interactive
11
+
12
+
Command Line Demos in a Light color, are meant to be watched. Dark theme are interactive.
10
13
11
14
`VIM` is used to edit files in this tutorial. If you prefer a different editor, please install it on the xdmod container.
12
15
13
16
## Submit some jobs to the cluster
14
17
18
+
**NOTE:** For the Gateways2020 tutorial the Presentor has already done this on their machine. If you are intresed in running this on your own please do so.
19
+
15
20
Before we install and configure XDMoD we are going to submit
16
21
some HPC jobs to the cluster. These jobs will run while we go through
17
22
the install and then we will be able to view the job information
@@ -24,16 +29,42 @@ ssh -p6222 hpcadmin@localhost
24
29
25
30
Run the provided script that submits several jobs to the cluster. These jobs
26
31
run as multiple different users with different job sizes and durations. The
27
-
purpose of this is to generate data to display in Open XDMoD. This, of course,
28
-
would not be required on a production deployment. This script should be run
29
-
as the hpcadmin user as it uses `sudo` to submit jobs as different cluster
30
-
users.
32
+
purpose of this is to generate data to display in Open XDMoD.
33
+
34
+
**NOTE**: This, of course, would not be required on a production deployment.
35
+
36
+
This script should be run as the hpcadmin user as it uses `sudo` to submit jobs as different cluster users.
31
37
```bash
32
38
submit_jobs.sh
33
39
```
34
40
41
+
Output should look similar to:
42
+
```bash
43
+
[hpcadmin@xdmod ~]$ submit_jobs.sh
44
+
Submitted batch job 2
45
+
Submitted batch job 3
46
+
Submitted batch job 4
47
+
Submitted batch job 5
48
+
Submitted batch job 6
49
+
Submitted batch job 7
50
+
Submitted batch job 8
51
+
Submitted batch job 9
52
+
Submitted batch job 10
53
+
Submitted batch job 11
54
+
Submitted batch job 12
55
+
Submitted batch job 13
56
+
Submitted batch job 14
57
+
Submitted batch job 15
58
+
Submitted batch job 16
59
+
Submitted batch job 17
60
+
Submitted batch job 18
61
+
Submitted batch job 19
62
+
```
63
+
35
64
## Open XDMoD Installation
36
65
66
+
**Note** This part will be brief in the Gateways2020 tutorial. This processes has been done already as part of the docker.
67
+
37
68
For this tutorial, the Open XDMoD software will be installed in the `xdmod` container.
38
69
Open XDMoD will use the MySQL database from the `mysql` container. Since we
39
70
will also be installing the optional Job Performance module we also run
@@ -54,6 +85,8 @@ Package Installation:
54
85
55
86
## Open XDMoD Configuration
56
87
88
+
**Note** This part will be brief in the Gateways2020 tutorial. This processes has been done already as part of the docker.
89
+
57
90
### Prerequisites
58
91
59
92
The following information is needed by Open XDMoD:
@@ -157,7 +192,7 @@ The Job Performance module is optional, but highly recommended.
157
192
158
193
[Job Performance](https://supremm.xdmod.org) data - for the Open source release we'll try to provide support for [Performance Co-Pilot (PCP)](https://pcp.io).
159
194
We chose PCP because it is included by default in Centos / RedHat.
160
-
In XSEDE we use tacc_stats and PCP (depending on the resource provider). and we have also used LDMS, Cray RUR and are aware of groups using Ganglia too.
195
+
In XSEDE we use tacc_stats and PCP (depending on the resource provider). We are also aware of groups using LDMS, Cray RUR and Ganglia too. We have a team now looking into Prometheus.
161
196
162
197
PCP has been [installed](https://github.com/ubccr/hpc-toolset-tutorial/blob/master/slurm/install.sh#L80-L87) and configured on the compute nodes.
163
198
This tutorial uses a cut-down list of PCP metrics from the recommended metrics for a production HPC system.
@@ -226,11 +261,13 @@ This is going to produce A LOT of output. Each of these commands have flags tha
226
261
227
262
## User / PI Names
228
263
264
+
**NOTE**: Feel Free to skip this part in the Gateways2020 Tutorial, as it does not impact the use of the system.
265
+
229
266
The resource manager logs contain the system usernames of the users that submitted jobs.
230
267
To display the full names in Open XDMoD you must provide a data file that contains the
231
268
full name of each user for each system username. This file is in a `csv` format.
232
269
233
-

270
+

234
271
235
272
This has not been automated for this tutorial. We dont want you to fall asleep!
236
273
@@ -284,6 +321,13 @@ xdmod-import-csv -t names:
284
321
285
322
## Open XDMoD Functionality (Interactive Demo)
286
323
324
+
**Note** The Gateways2020 demo has additional anonymized historical data (about 2 months) that can be added, this takes a while (depending on your system, mine took about 3 hours...) to actually run. This data will be used by the presentor for this demonstration.
325
+
326
+
If / when you run this it will look a lot like when we ran `/srv/xdmod/scripts/shred-ingest-aggregate-all.sh`
327
+
328
+
```bash
329
+
sudo /srv/xdmod/historical/add-historical.sh
330
+
```
287
331
288
332
### Administration
289
333
@@ -299,12 +343,22 @@ Admin Dashboard:
299
343
300
344
Lets actually use Open XDMoD now.
301
345
302
-
User:
346
+
With a fully installed system we have quite a bit of data. Job information, Storage Usage, Cloud Usage, Job Performance (SUPREMM)
347
+

348
+
349
+
User Dashboard:
350
+

351
+
352
+

303
353
304
354
PI:
355
+

305
356
306
-
Center: Staff
357
+
Center Staff:
358
+

0 commit comments