PBS Pro Exporter

A Prometheus exporter for realtime job monitoring of PBS Professional HPC clusters. Gathers metrics from PBS job cgroups along with job metadata and node metrics.

This exporter collects:

Node Metrics: Cluster-wide node status and attributes from pbsnodes.
Job Metrics: Job submission information for each PBS job.
Cgroup Metrics: Realtime CPU and memory usage for each job via cgroups. Supports both V1 and V2.

Usage

Configuration is managed with command-line flags. View command help:

pbs_exporter --help
usage: pbs_exporter [<flags>]

Flags:
  --[no-]help                      Show context-sensitive help (also try --help-long and --help-man).
  --[no-]cgroup.enabled            Enable cgroup collector.
  --cgroup.root="/sys/fs/cgroup"   Root path of cgroup filesystem hierarchy.
  --[no-]job.enabled               Enable job collector.
  --web.listen-address=":9307"     Address to listen on for web interface and telemetry.
  --[no-]node.enabled              Enable node collector.
  --job.pbs_home="/var/spool/pbs"  PBS home directory.
  --scrape.timeout=5               Per-scrape timeout in seconds.
  --log.level=info                 Only log messages with the given severity or above. One of: [debug, info, warn, error]
  --log.format=logfmt              Output format of log messages. One of: [logfmt, json]
  --[no-]version                   Show application version.

The exporter is designed to run in two modes: on compute nodes to gather job-specific data, and on a single node to gather cluster-wide metrics.

Job Metrics (Compute Node)

Run the exporter on all compute nodes to collect job, and cgroup metrics.

pbs_exporter

Cluster Metrics (Head/Login Node)

PBS node metrics will be the same from every node and should be collected once or deduplicated. Run the exporter for only node metrics:

pbs_exporter --node.enabled --no-cgroup.enabled --no-job.enabled

Installation

Binaries can be downloaded from the Github releases page.

Build Instructions

Download source and build, requires go and make.

git clone https://github.com/0nebody/pbs_exporter.git
cd pbs_exporter
make pbs_exporter

Monitoring

Grafana Dashboards

Pre-built Grafana dashboards can be downloaded along with the exporter from the Github releases page. Basic dashboard modifications can be made with the configuration file and building from Jsonnet.

make dashboards

Dashboards are split into public and private dashboards. Public dashboards filter metrics to display jobs launched by the logged in user. This assumes common usernames between HPC and Grafana users using a shared auth backend.

GPU Metrics

GPU metrics are not collected by this exporter, but integrates with the NVIDIA DCGM exporter. The DCGM exporter requires a PBS hook to map job IDs with assigned GPUs. Configuring DCGM exporter for HPC jobs is documented in the NVIDIA DCGM repository.

Prometheus

An example Prometheus configuration is available in the repository to help you get started with scraping the exporter.

Common Issues

Privilege Requirements

Access to $PBS_HOME/mom_priv/jobs requires elevated privileges. This is required when collecting with --job.enabled set to true.

Use setcap 'cap_dac_read_search=ep' pbs_exporter to run with minimal elevated privileges.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.github		.github
cmd/pbs_exporter		cmd/pbs_exporter
hack		hack
internal		internal
misc		misc
.gitignore		.gitignore
.lint		.lint
go.mod		go.mod
go.sum		go.sum
makefile		makefile
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PBS Pro Exporter

Usage

Job Metrics (Compute Node)

Cluster Metrics (Head/Login Node)

Installation

Build Instructions

Monitoring

Grafana Dashboards

GPU Metrics

Prometheus

Common Issues

Privilege Requirements

About

Uh oh!

Releases 5

Packages

Contributors 2

Languages

0nebody/pbs_exporter

Folders and files

Latest commit

History

Repository files navigation

PBS Pro Exporter

Usage

Job Metrics (Compute Node)

Cluster Metrics (Head/Login Node)

Installation

Build Instructions

Monitoring

Grafana Dashboards

GPU Metrics

Prometheus

Common Issues

Privilege Requirements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 2

Languages

Packages