Skip to content

Commit 5ad5485

Browse files
authored
Merge pull request #86 from plessbd/xdmod-gateways-updates
Gateways 2020 updates
2 parents ea1bf49 + 3e8c23a commit 5ad5485

File tree

7 files changed

+66
-12
lines changed

7 files changed

+66
-12
lines changed

xdmod/README.md

Lines changed: 66 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,22 @@
11
## Overview
22

3-
In this part of the tutorial we are going to install and configure Open XDMoD.
3+
**NOTE:**
4+
Due to COVID and this tutorial being virtual and much shorter than anticipated; this part of the tutorial is going to be a bit more of an interactive demo. Some parts are going to be skipped over quicker than usual, however, our team is available in SLACK and the zoom chat to answer any questions that you may have.
5+
6+
In this part of the tutorial we are going to go over the installation and configuratoin of Open XDMoD.
47
The base component of Open XDMoD uses the job accounting logs from the HPC
5-
resource manager as the data source. We are also going to install the optional Job Performance Module. This
6-
allows Open XDMoD to also display performance data for HPC jobs.
8+
resource manager as the data source. We have also installed the optional Job Performance Module. This allows Open XDMoD to also display performance data for HPC jobs.
79

810
The asciinema media is not meant to be used on its own, they are intended for use in a "live" demonstration.
9-
Command Line Demos in a Light color, are meant to be watched. Dark theme are interactive
11+
12+
Command Line Demos in a Light color, are meant to be watched. Dark theme are interactive.
1013

1114
`VIM` is used to edit files in this tutorial. If you prefer a different editor, please install it on the xdmod container.
1215

1316
## Submit some jobs to the cluster
1417

18+
**NOTE:** For the Gateways2020 tutorial the Presentor has already done this on their machine. If you are intresed in running this on your own please do so.
19+
1520
Before we install and configure XDMoD we are going to submit
1621
some HPC jobs to the cluster. These jobs will run while we go through
1722
the install and then we will be able to view the job information
@@ -24,16 +29,42 @@ ssh -p6222 hpcadmin@localhost
2429

2530
Run the provided script that submits several jobs to the cluster. These jobs
2631
run as multiple different users with different job sizes and durations. The
27-
purpose of this is to generate data to display in Open XDMoD. This, of course,
28-
would not be required on a production deployment. This script should be run
29-
as the hpcadmin user as it uses `sudo` to submit jobs as different cluster
30-
users.
32+
purpose of this is to generate data to display in Open XDMoD.
33+
34+
**NOTE**: This, of course, would not be required on a production deployment.
35+
36+
This script should be run as the hpcadmin user as it uses `sudo` to submit jobs as different cluster users.
3137
```bash
3238
submit_jobs.sh
3339
```
3440

41+
Output should look similar to:
42+
```bash
43+
[hpcadmin@xdmod ~]$ submit_jobs.sh
44+
Submitted batch job 2
45+
Submitted batch job 3
46+
Submitted batch job 4
47+
Submitted batch job 5
48+
Submitted batch job 6
49+
Submitted batch job 7
50+
Submitted batch job 8
51+
Submitted batch job 9
52+
Submitted batch job 10
53+
Submitted batch job 11
54+
Submitted batch job 12
55+
Submitted batch job 13
56+
Submitted batch job 14
57+
Submitted batch job 15
58+
Submitted batch job 16
59+
Submitted batch job 17
60+
Submitted batch job 18
61+
Submitted batch job 19
62+
```
63+
3564
## Open XDMoD Installation
3665

66+
**Note** This part will be brief in the Gateways2020 tutorial. This processes has been done already as part of the docker.
67+
3768
For this tutorial, the Open XDMoD software will be installed in the `xdmod` container.
3869
Open XDMoD will use the MySQL database from the `mysql` container. Since we
3970
will also be installing the optional Job Performance module we also run
@@ -54,6 +85,8 @@ Package Installation:
5485

5586
## Open XDMoD Configuration
5687

88+
**Note** This part will be brief in the Gateways2020 tutorial. This processes has been done already as part of the docker.
89+
5790
### Prerequisites
5891

5992
The following information is needed by Open XDMoD:
@@ -149,6 +182,8 @@ Reference: [Hierarchy Guide](https://open.xdmod.org/hierarchy.html)
149182

150183
## Open XDMoD Job Performance
151184

185+
**Note** This part will be brief in the Gateways2020 tutorial. This processes has been done already as part of the docker.
186+
152187
The Job Performance module is optional, but highly recommended.
153188

154189
![Job Performance Dataflow](./tutorial-screenshots/admin-job-performance-dataflow.png)
@@ -157,7 +192,7 @@ The Job Performance module is optional, but highly recommended.
157192

158193
[Job Performance](https://supremm.xdmod.org) data - for the Open source release we'll try to provide support for [Performance Co-Pilot (PCP)](https://pcp.io).
159194
We chose PCP because it is included by default in Centos / RedHat.
160-
In XSEDE we use tacc_stats and PCP (depending on the resource provider). and we have also used LDMS, Cray RUR and are aware of groups using Ganglia too.
195+
In XSEDE we use tacc_stats and PCP (depending on the resource provider). We are also aware of groups using LDMS, Cray RUR and Ganglia too. We have a team now looking into Prometheus.
161196

162197
PCP has been [installed](https://github.com/ubccr/hpc-toolset-tutorial/blob/master/slurm/install.sh#L80-L87) and configured on the compute nodes.
163198
This tutorial uses a cut-down list of PCP metrics from the recommended metrics for a production HPC system.
@@ -226,11 +261,13 @@ This is going to produce A LOT of output. Each of these commands have flags tha
226261

227262
## User / PI Names
228263

264+
**NOTE**: Feel Free to skip this part in the Gateways2020 Tutorial, as it does not impact the use of the system.
265+
229266
The resource manager logs contain the system usernames of the users that submitted jobs.
230267
To display the full names in Open XDMoD you must provide a data file that contains the
231268
full name of each user for each system username. This file is in a `csv` format.
232269

233-
![Group By User(names not importe)](./tutorial-screenshots/usernames.png)
270+
![Group By User(names not imported)](./tutorial-screenshots/usernames.png)
234271

235272
This has not been automated for this tutorial. We dont want you to fall asleep!
236273

@@ -284,6 +321,13 @@ xdmod-import-csv -t names:
284321

285322
## Open XDMoD Functionality (Interactive Demo)
286323

324+
**Note** The Gateways2020 demo has additional anonymized historical data (about 2 months) that can be added, this takes a while (depending on your system, mine took about 3 hours...) to actually run. This data will be used by the presentor for this demonstration.
325+
326+
If / when you run this it will look a lot like when we ran `/srv/xdmod/scripts/shred-ingest-aggregate-all.sh`
327+
328+
```bash
329+
sudo /srv/xdmod/historical/add-historical.sh
330+
```
287331

288332
### Administration
289333

@@ -299,12 +343,22 @@ Admin Dashboard:
299343

300344
Lets actually use Open XDMoD now.
301345

302-
User:
346+
With a fully installed system we have quite a bit of data. Job information, Storage Usage, Cloud Usage, Job Performance (SUPREMM)
347+
![Public User Usage](./tutorial-screenshots/public-user-options.png)
348+
349+
User Dashboard:
350+
![Logged in User Dashboard](./tutorial-screenshots/loggedin-dashboard.png)
351+
352+
![Logged in User Job Performance](./tutorial-screenshots/loggedin-performance.png)
303353

304354
PI:
355+
![Logged in PI Dashboard](./tutorial-screenshots/loggedin-pi-dashboard.png)
305356

306-
Center: Staff
357+
Center Staff:
358+
![Logged in Center Staff Dashboard](./tutorial-screenshots/centerdirector-dashboard.png)
307359

360+
Report Generator:
361+
![Report Generator](./tutorial-screenshots/report-generator.png)
308362
## Tutorial Navigation
309363
[Next - OnDemand](../ondemand/README.md)
310364
[Previous Step - ColdFront](../coldfront/README.md)
276 KB
Loading
177 KB
Loading
166 KB
Loading
155 KB
Loading
213 KB
Loading
268 KB
Loading

0 commit comments

Comments
 (0)