Skip to content

Conversation

@samuelcwilliams
Copy link
Collaborator

Added support for Slurm to uit.py.

@samuelcwilliams samuelcwilliams requested review from araglu and sdc50 March 17, 2025 14:20
@samuelcwilliams samuelcwilliams self-assigned this Mar 17, 2025
@samuelcwilliams samuelcwilliams marked this pull request as draft March 17, 2025 14:22
@samuelcwilliams
Copy link
Collaborator Author

@sdc50 @araglu here's the draft pull request. I'll address the issues in the pipeline in a little bit, but in the meantime I'd appreciate any feedback y'all have!

@araglu
Copy link
Collaborator

araglu commented Mar 17, 2025

I think some of these changes will also need to be made in async_client.py. It has very similar code to uit.py but in an async context. I noticed our app is calling the async get_raw_queue_stats(), so it did not get your changes.

@samuelcwilliams
Copy link
Collaborator Author

Will do! I started on it a while back, but ran into some bugs with what I already had in uit.py so I reverted the changes so I could start fresh once I had those bugs worked out.

I'll get those updates rolled in ASAP!

@samuelcwilliams samuelcwilliams marked this pull request as ready for review March 21, 2025 19:03
Comment on lines 256 to 257
self.scheduler = NODE_TYPES[self.system]["scheduler"]
self.commands = COMMANDS[self.scheduler]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right spot for these two lines? I get an error on them whenever I try to submit a new job or delete that job.

In uit.py, these lines are in prepare_connect() which is called by both sync and async connect(), so async_client.py might not need these lines here.

Copy link
Collaborator Author

@samuelcwilliams samuelcwilliams Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I guess I didn't realize it was calling prepare_connect() from the regular client. I'll make the changes ASAP

@araglu
Copy link
Collaborator

araglu commented Mar 24, 2025

The async submit() is still using a hard-coded "qsub" command in async_client.py:590.

Edit: the error message from qsub is:

sbatch: error: invalid partition specified: background
sbatch: error: Batch job submission failed: Invalid partition name specified

I thought using the qsub compatibility command would work until we get pyuit to write out a Slurm script, but I can't get it to work, even with other queues.

@samuelcwilliams samuelcwilliams marked this pull request as draft March 25, 2025 14:57
@araglu
Copy link
Collaborator

araglu commented Mar 25, 2025

The "Max Wall Time" and "Max Processes" don't look right to me. Wall time said 10080 when I was expecting it in a 04:00:00 format. The processes looked more like number of nodes when I'm used to seeing number of cores.

ERDC\RDITLSCW and others added 2 commits April 23, 2025 14:30
This also sorts the list by system name.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants