slurmq

GPU quota management for Slurm clusters.

$ slurmq check

╭──────────────────── GPU Quota Report ────────────────────╮
│                                                          │
│   User:     dedalus                                      │
│   QoS:      medium                                       │
│   Cluster:  Stella HPC                                   │
│                                                          │
│   ████████████████████░░░░░░░░░░ 68.5%                   │
│                                                          │
│   Used:      342.5 GPU-hours                             │
│   Remaining: 157.5 GPU-hours                             │
│   Quota:     500 GPU-hours (rolling 30 days)             │
│                                                          │
╰──────────────────────────────────────────────────────────╯

Install

uv tool install slurmq

Setup

slurmq config init       # interactive wizard
slurmq config show       # verify settings
slurmq config validate   # check syntax before deploy

Config resolution order:

SLURMQ_CONFIG env var
~/.config/slurmq/config.toml (user)
/etc/slurmq/config.toml (system-wide)

default_cluster = "stella"

[clusters.stella]
name = "Stella HPC"
account = "research"
qos = ["low", "medium"]
quota_limit = 500        # GPU-hours
rolling_window_days = 30

Commands

check

slurmq check                  # current user
slurmq check --user alice     # specific user
slurmq check --cluster other  # different cluster
slurmq check --forecast       # usage projection
slurmq --json check           # machine-readable
slurmq --quiet check          # silent on success (for scripts)

efficiency

Analyze job resource efficiency (like seff).

slurmq efficiency 12345

Flags low efficiency: CPU < 30%, Memory < 20%.

report

Generate usage reports (admin).

slurmq report                          # table view
slurmq report --format csv -o out.csv

monitor

Real-time monitoring with optional enforcement (admin).

slurmq monitor                # live dashboard, 30s refresh
slurmq monitor --interval 10
slurmq monitor --once         # single check, for cron
slurmq monitor --enforce      # cancel jobs over quota

stats

Cluster-wide analytics with month-over-month comparison.

slurmq stats                          # GPU utilization + wait times
slurmq stats --days 14                # custom period
slurmq stats --no-compare             # skip MoM comparison
slurmq stats -p gpu -p gpu-large      # specific partitions
slurmq stats --small-threshold 25     # custom job size threshold
slurmq --json stats                   # machine-readable

Shows:

GPU utilization by partition/QoS
Wait time analysis (median, % jobs waiting > 6h)
Small vs large job breakdown
Month-over-month trends

Enforcement

Cancel jobs automatically when users exceed quota.

[enforcement]
enabled = true
dry_run = true            # preview mode
grace_period_hours = 24   # warn before cancel
exempt_users = ["admin"]
exempt_job_prefixes = ["checkpoint_"]

Run with slurmq monitor --enforce. Disable dry_run when ready.

Grace period: users exceeding quota get a warning window before jobs are cancelled.

Job States

Problematic states are highlighted:

State	Meaning
`OOM`	Out of Memory
`TO`	Timeout
`NF`	Node Failure
`F`	Failed
`PR`	Preempted

Scripting

# check quota status
if slurmq --json check | jq -e '.status == "exceeded"' > /dev/null; then
  echo "Quota exceeded"
fi

# cron: enforce every 5 minutes (quiet mode)
*/5 * * * * slurmq --quiet monitor --once --enforce >> /var/log/slurmq.log 2>&1

Documentation

Online: dedalus-labs.github.io/slurmq

For LLMs: llms.txt | llms-full.txt

Locally:

uv sync --extra docs
uv run mkdocs serve

Development

git clone https://github.com/dedalus-labs/slurmq.git && cd slurmq
uv sync --all-extras
uv run pytest
uv run ruff check
uv run ty check

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github		.github
docs		docs
scripts		scripts
src/slurmq		src/slurmq
tests		tests
.gitignore		.gitignore
.markdownlint.yaml		.markdownlint.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
.release-please-manifest.json		.release-please-manifest.json
.yamllint.yml		.yamllint.yml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
release-please-config.json		release-please-config.json
typos.toml		typos.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

slurmq

Install

Setup

Commands

check

efficiency

report

monitor

stats

Enforcement

Job States

Scripting

Documentation

Development

License

About

Uh oh!

Releases 3

Packages

Contributors 2

Uh oh!

Languages

License

dedalus-labs/slurmq

Folders and files

Latest commit

History

Repository files navigation

slurmq

Install

Setup

Commands

check

efficiency

report

monitor

stats

Enforcement

Job States

Scripting

Documentation

Development

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Uh oh!

Languages

Packages