Skip to content

Conversation

@pablogoitia
Copy link

This PR addresses #2697. While implementing the wrappers, I realized that there are Slurm directives that cannot be mapped directly onto Flux ones, because the set of Flux batch directives is more limited. As a solution to this problem, I have been exploring the option of launching jobs on the HPC by manually generating Jobspecs (represented in YAML files), which are processed using the Flux Python API. Jobspecs allow us to control resources in a more detailed and advanced way. So far, I have not found a more direct way to launch jobs using their specification.

In this issue, I provide a prototype for solving this problem. The implementation is not yet fully functional: I only provide limited support for vertical wrappers. However, extending the implementation to other types of wrappers would be relatively straightforward.

The part that is not easy and would require more dedication is the correct construction of the Jobspecs, ensuring that the scheduling parameters (e.g., processors, tasks, threads...) are accurately mapped.

This method would facilitate the introduction of equivalents to Slurm's hetjobs (there are still no direct alternatives in Flux) in the future.

Note: this PR would overwrite a significant part of the implementation carried out in PR #2708.

Check List
Not applies, as the branch would not be merged into master yet.

@pablogoitia pablogoitia self-assigned this Nov 26, 2025
@pablogoitia pablogoitia added enhancement New feature or request working on Someone is working on it labels Nov 26, 2025
@pablogoitia pablogoitia moved this from Todo to In Progress in Autosubmit project Nov 26, 2025
@manuel-g-castro
Copy link
Contributor

Hi I was trying your branch to see the issue with the module load and I found some weird things.

I have the following yaml for the jobs:

JOBS:
  SIM:
    DEPENDENCIES: SIM-1
    RUNNING: chunk
    PROCESSORS: 10
    WALLCLOCK: 00:10
    TASKS: 1
    SCRIPT: |
        echo $PATH
        which module
        module load impi

And when I executed I found that Autosubmit was doing the following request:

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          33336145       gpp a01a_AST bsc03237 PD       0:00     10 (None)

Maybe the PROCESSORS is being misinterpreted as the number of nodes.

@dbeltrankyl
Copy link
Collaborator

dbeltrankyl commented Dec 5, 2025

Maybe the PROCESSORS is being misinterpreted as the number of nodes.

Didn't check how it is coded, but another possibility is that you don't have the PLATFORMS.PLATFORM.PROCESSORS_PER_NODE defined, and instead of 112 for mn5, it is taking the value as 1?

@manuel-g-castro
Copy link
Contributor

Hi @dbeltrankyl ! Thanks for dropping in!

I have the following for the platform:

PLATFORMS:
  MARENOSTRUM5:
    TYPE: slurm
    HOST: ...
    PROJECT: ...
    USER: ...
    QUEUE: gp_debug
    SCRATCH_DIR: /gpfs/scratch
    ADD_PROJECT_TO_HOST: false
    MAX_WALLCLOCK: 48:00
    PROCESSORS_PER_NODE: 112
    MAX_PROCESSORS: 112

@dbeltrankyl
Copy link
Collaborator

Then it is ignoring both max_processors and processors_per_node, no?

as 10 x 112 is way more than 112 😅

@manuel-g-castro
Copy link
Contributor

Hi, @pablogoitia , regarding the "command not found module".

I have executed your branch and faced the same issue.

To test it I created this job file to see what is being loaded in the environment in the inner job. I am doing all of my tests in MareNostrum 5.

JOBS:
  SIM:
    DEPENDENCIES: SIM-1
    RUNNING: chunk
    PROCESSORS: 1
    WALLCLOCK: 00:10
    TASKS: 1
    SCRIPT: printenv

To only find that everything was unset with the exception of Flux specific variables (I attach the output at the end of this comment as an appendix).

So that is why it is not being able to find any executable of the system. But then I altered your script submission to do the same right before the execution of the srun flux start... and found that all the environment variables seem to be properly set.

Then I noticed that the ASThread job was producing an error (this is not transferred back to local, not sure why). And there I saw the following message:

Dec 05 12:36:41.085766 CET 2025 job-list.err[0]: parse_jobspec: job f9Fs12P invalid jobspec; level 0: Expected integer, got object

So my guess is that something is failing in the job spec, so it is not properly being executed to the.

APPENDIX

ASTHREAD FULL ERROR

load MINICONDA/24.1.2 (PATH, LD_LIBRARY_PATH, LIBRARY_PATH, C_INCLUDE_PATH,
CPLUS_INCLUDE_PATH, PKG_CONFIG_PATH, MANPATH) 
flux-start: /home/bsc/bsc032371/venvs/flux/libexec/flux/cmd/flux-broker python flux_runner.py
Dec 05 12:36:41.085766 CET 2025 job-list.err[0]: parse_jobspec: job f9Fs12P invalid jobspec; level 0: Expected integer, got object
Traceback (most recent call last):
  File "/gpfs/scratch/bsc32/bsc032371/a01a/LOG_a01a/flux_runner.py", line 27, in <module>
    print("RESOURCE COUNTS :" + str(jobspec.resource_counts()))
                                    ~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/home/bsc/bsc032371/venvs/flux/lib/python3.13/site-packages/flux/job/Jobspec.py", line 779, in resource_counts
    for _, resource, count in self.resource_walk():
                              ~~~~~~~~~~~~~~~~~~^^
  File "/home/bsc/bsc032371/venvs/flux/lib/python3.13/site-packages/flux/job/Jobspec.py", line 739, in walk_helper
    res_count = count * resource["count"]
                ~~~~~~^~~~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for *: 'int' and 'dict'
Dec 05 12:36:41.113821 CET 2025 broker.err[0]: rc2.0: python flux_runner.py Exited (rc=1) 0.2s
srun: error: gs02r2b16: task 0: Exited with exit code 1
srun: Terminating StepId=33338054.0

INNER JOBSPEC

###############################################################################
#                   SIM a01a EXPERIMENT
###############################################################################

resources:
- type: node
  count:
    min: 1
  exclusive: false
  with:
  - type: slot
    label: task
    count: 1
    with:
    - type: core
      count: 1
tasks:
- command:
  - '{{tmpdir}}/script'
  slot: task
  count:
    per_slot: 1
attributes:
  system:
    duration: 600
    cwd: /gpfs/scratch/bsc32/bsc032371/a01a/LOG_a01a
    job:
      name: a01a_20000101_fc0_1_SIM
    shell:
      options:
        output:
          stdout:
            type: file
            path: /gpfs/scratch/bsc32/bsc032371/a01a/LOG_a01a/a01a_20000101_fc0_1_SIM.cmd.out.0
          stderr:
            type: file
            path: /gpfs/scratch/bsc32/bsc032371/a01a/LOG_a01a/a01a_20000101_fc0_1_SIM.cmd.err.0
    files:
      script:
        mode: 33216
        data: |+
          #!/bin/bash

          ###############################################################################
          # The following lines contain the script. [SIM a01a EXPERIMENT]
          ###############################################################################

          ###################
          # Autosubmit header
          ###################
          locale_to_set=$(locale -a | grep ^C.)
          if [ -z "$locale_to_set" ] ; then
              # locale installed...
              export LC_ALL=$locale_to_set
          else
              # locale not installed...
              locale_to_set=$(locale -a | grep ^en_GB.utf8)
              if [ -z "$locale_to_set" ] ; then
                  export LC_ALL=$locale_to_set
              else
                  export LC_ALL=C
              fi 
          fi

          set -xuve
          job_name_ptrn='/gpfs/scratch/bsc32/bsc032371/a01a/LOG_a01a/a01a_20000101_fc0_1_SIM'
          echo $(date +%s) > ${job_name_ptrn}_STAT_0

          ################### 
          # AS CHECKPOINT FUNCTION
          ###################
          # Creates a new checkpoint file upon call based on the current numbers of calls to the function

          AS_CHECKPOINT_CALLS=0
          function as_checkpoint {
              AS_CHECKPOINT_CALLS=$((AS_CHECKPOINT_CALLS+1))
              touch ${job_name_ptrn}_CHECKPOINT_${AS_CHECKPOINT_CALLS}
          }
          

          ###################
          # Autosubmit job
          ###################

          r=0
          set +e
          bash -e <<__AS_CMD__
          set -xuve
          printenv


          __AS_CMD__

          r=$?

          # Write the finish time in the job _STAT_
          echo $(date +%s) >> ${job_name_ptrn}_STAT_0

          # If the user-provided script failed, we exit here with the same exit code;
          # otherwise, we let the execution of the tailer happen, where the _COMPLETED
          # file will be created.
          if [ $r -ne 0 ]; then
              exit $r
          fi
          ###################
          # Autosubmit tailer
          ###################
          set -xuve
          touch ${job_name_ptrn}_COMPLETED
          exit 0

        encoding: utf-8
version: 1

INNER JOB PRINTENV

PMI_SIZE=1
PMI_FD=21
PWD=/gpfs/scratch/bsc32/bsc032371/a01a/LOG_a01a
FLUX_TASK_RANK=0
OMPI_MCA_btl_vader_backing_directory=/scratch/tmp/33338054/flux-owJ029/jobtmp-0-f9Fs12P
FLUX_KVS_NAMESPACE=job-5419040768
CUDA_DEVICE_ORDER=PCI_BUS_ID
FLUX_TERMINUS_SESSION=0
FLUX_JOB_NNODES=1
FLUX_JOB_SIZE=1
CUDA_VISIBLE_DEVICES=-1
FLUX_JOB_TMPDIR=/scratch/tmp/33338054/flux-owJ029/jobtmp-0-f9Fs12P
SHLVL=2
FLUX_PMI_LIBRARY_PATH=/home/bsc/bsc032371/venvs/flux/lib/flux/libpmi.so
FLUX_URI=local:///scratch/tmp/33338054/flux-owJ029/local-0
PMI_RANK=0
LD_LIBRARY_PATH=/home/bsc/bsc032371/venvs/flux/lib/flux
FLUX_JOB_ID_PATH=/f9Fs12P
FLUX_JOB_ID=f9Fs12P
LC_ALL=C
FLUX_TASK_LOCAL_ID=0
_=/usr/bin/printenv

@pablogoitia
Copy link
Author

Maybe the PROCESSORS is being misinterpreted as the number of nodes.

Hi @manuel-g-castro! Fortunately, I can say that this is the expected behavior. For that task, you are requesting 10 PROCESSORS and 1 TASK. This is translated to the ASTHREAD script header as 10 tasks and 1 task per node, so the result is a request of a total of 10 nodes.

@dbeltrankyl left a really good explanation on how the job resource parameters work in this issue

@pablogoitia
Copy link
Author

So my guess is that something is failing in the job spec, so it is not properly being executed to the.

Hi @manuel-g-castro. Thank you so much for reporting this. I have not tested this specific case in remote yet, but it is a special one because it covers those requests where no node count is specified, but tasks per node, for example (remember we talked about it yesterday in the meeting). In this case, what I do is to request a minimum of one node using the min key in the node resource. I will check if this is an error in the Jobspec or if it is a matter of compatibility with the Jobspec V1. I would initially say that it is the first one because I have observed a discrepance between my specification and an example in the docs. In that case, I will tell you when I upload the fix to the branch.

Meanwhile, any other job specification that includes the count of nodes should properly work, including those where not nodes nor tasks per node are provided.

@pablogoitia
Copy link
Author

pablogoitia commented Dec 5, 2025

Hi again, @manuel-g-castro. After some testing I have concluded that the job specification is right. There are some examples in RFC 14. Specifically, Use Case 1.6 shows an example of min count of nodes.

However, something is leading Flux to fail, and it does not matter if the Jobspec V1 or the normal Jobspec is being used because it fails anyway.

I will search for a way to handle this specific case so that I could avoid the usage of the min key. I will let you know the solution.

Thanks again for reporting the bug. If you want to keep testing, remember that I expect any other case to work. By now, do not test with cases where tasks per node are specified without specifying the node count too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request working on Someone is working on it

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

4 participants