seqeralabs · FloWuenne · Aug 15, 2025 · Aug 15, 2025 · Aug 15, 2025 · adamrtalbot
diff --git a/02_setup_compute/README.md b/02_setup_compute/README.md
@@ -2,7 +2,7 @@
 
 This directory contains YAML configuration files for the creation of two compute environments:
 
-- `aws_fusion_nvme.yml`: This compute environment is designed to run on Amazon Web Services (AWS) Batch and uses Fusion V2 with the 6th generation intel instance type with NVMe storage.
+- `aws_fusion_nvme.yml`: This compute environment is designed to run on Amazon Web Services (AWS) Batch and uses Fusion V2 on SPOT instances with the 6th generation intel instance type with NVMe storage and the Fusion snapshot feature activated. Fusion snapshots is a new feature in Fusion that allows you to snapshot and restore your machine when a spot interruption occurs.
 - `aws_plain_s3.yml`: This compute environment is designed to run on Amazon Web Services (AWS) Batch and uses the plain AWS Batch with S3 storage.
 
 These YAML files provide best practice configurations for utilizing these two storage types in AWS Batch compute environments. The Fusion V2 configuration is tailored for high-performance workloads leveraging NVMe storage, while the plain S3 configuration offers a standard setup for comparison and workflows that don't require the advanced features of Fusion V2.
@@ -24,6 +24,23 @@ These YAML files provide best practice configurations for utilizing these two st
 - You have an S3 bucket for the Nextflow work directory.
 - You have reviewed and updated the environment variables in [env.sh](../01_setup_environment/env.sh) to match your specific AWS setup.
 
+### Using existing manual AWS queues in your compute environments
+
+#### Setting manual queues during CE creation with seqerakit
+
+In the event that you are not standing up your compute queues using Batch Forge but use a manual setup approach, you will need to modify your YAML configurations. You need to change `config-mode: forge` to `config-mode: manual` and add the following lines pointing to your specific queues to the YAML files.
+
+```
+head-queue: "myheadqueue-head"
+compute-queue: "mycomputequeue-work"
+```
+
+Please note that in the case of manual queues the resource labels will have to be attached to your queues already and setting them on the Seqera Platform during CE creation when using manual queues will not work. 
+
+#### Manually setting the launch template for Fusion
+
+If you are not using Batch Forge to set up your queues, you will also have to manually set the launch template for your instances in your fusion queues. To do this, add the launch template we provide [Fusion launch template](./fusion_launch_template.txt) to your AWS batch account, then clone your existing AWS compute environment and during the Instance configuration step, choose the fusion launch template you created.
+
 ### YAML format description
 
 #### 1. Environment Variables in the YAML
@@ -44,53 +61,57 @@ Using these variables allows easy customization of the compute environment confi
 
 #### 2. Fusion V2 Compute Environment
 
-If we inspect the contents of [`aws_fusion_nvme.yml`](./compute-envs/aws_fusion_nvme.yml) as an example, we can see the overall structure is as follows:
+Fusion snapshots is a new feature in Fusion that allows you to snapshot and restore your machine when a spot interruption occurs. If we inspect the contents of [`./compute-envs/aws_fusion_snapshots.yml`](./compute-envs/aws_fusion_snapshots.yml) as an example, we can see the overall structure is as follows:
 
-```yaml
+```YAML
 compute-envs:
   - type: aws-batch
     config-mode: forge
-    name: "$COMPUTE_ENV_PREFIX_fusion_nvme"
+    name: "${COMPUTE_ENV_PREFIX}_fusion_snapshots"
     workspace: "$ORGANIZATION_NAME/$WORKSPACE_NAME"
     credentials: "$AWS_CREDENTIALS"
     region: "$AWS_REGION"
     work-dir: "$AWS_WORK_DIR"
     wave: True
     fusion-v2: True
     fast-storage: True
+    snapshots: True
     no-ebs-auto-scale: True
     provisioning-model: "SPOT"
-    instance-types: "c6id,m6id,r6id"
+    instance-types: "c6id.4xlarge,c6id.8xlarge,r6id.2xlarge,m6id.4xlarge,c6id.12xlarge,r6id.4xlarge,m6id.8xlarge"
     max-cpus: 1000
     allow-buckets: "$AWS_COMPUTE_ENV_ALLOWED_BUCKETS"
     labels: storage=fusionv2,project=benchmarking"
     wait: "AVAILABLE"
     overwrite: False
 ```
-<details>
-<summary>Click to expand: YAML format explanation</summary>
 
-The top-level block `compute-envs` mirrors the `tw compute-envs` command. The `type` and `config-mode` options are seqerakit specific. The nested options in the YAML correspond to options available for the Seqera Platform CLI command. For example, running `tw compute-envs add aws-batch forge --help` shows options like `--name`, `--workspace`, `--credentials`, etc., which are provided to the `tw compute-envs` command via this YAML definition.
+You should note it is very similar to the Fusion V2 compute environment, but with the following differences:
+
+- `provisioning-model` is set to `SPOT` to enable the use of spot instances.
+- `snapshots` is set to True to allow Fusion to automatically restore a job if interrupted by spot reclamation
+- `instance-types` are set to a very restrictive set of types that have sufficient memory and bandwidth to snapshot the machine within the time limit imposed by AWS during a spot reclamation event.
 
-</details>
+Note: When setting `snapshots: True`, Fusion, Wave and fast-instance storage will be enabled by default for the CE. We have set these to `true` here for documentation purposes and consistency.
 
 #### Pre-configured Options in the YAML
 
-We've pre-configured several options to optimize your Fusion V2 compute environment:
+We've pre-configured several options to optimize your Fusion snapshots compute environment:
 
 | Option | Value | Purpose |
 |--------|-------|---------|
 | `wave` | `True` | Enables Wave, required for Fusion in containerized workloads |
 | `fusion-v2` | `True` | Enables Fusion V2 |
 | `fast-storage` | `True` | Enables fast instance storage with Fusion v2 for optimal performance |
+| `snapshots` | `True` | Enables automatic snapshot creation and restoration for spot instance interruptions |
 | `no-ebs-auto-scale` | `True` | Disables EBS auto-expandable disks (incompatible with Fusion V2) |
 | `provisioning-model` | `"SPOT"` | Selects cost-effective spot pricing model |
-| `instance-types` | `"c6id,m6id,r6id"` | Selects 6th generation Intel instance types with high-speed local storage |
+| `instance-types` | `"c6id.4xlarge,c6id.8xlarge,`<br>`r6id.2xlarge,m6id.4xlarge,`<br>`c6id.12xlarge,r6id.4xlarge,`<br>`m6id.8xlarge"` | Selects instance types with a small enough memory footprint and fast enough network to snapshot the machine within the time limit imposed by AWS during a spot reclamation event. |
-| `instance-types` | `"c6id.4xlarge,c6id.8xlarge,`<br>`r6id.2xlarge,m6id.4xlarge,`<br>`c6id.12xlarge,r6id.4xlarge,`<br>`m6id.8xlarge"` | Selects instance types with a small enough memory footprint and fast enough network to snapshot the machine within the time limit imposed by AWS during a spot reclamation event. |
+| `instance-types` | `"c6id.4xlarge,c6id.8xlarge,`<br>`r6id.2xlarge,m6id.4xlarge,`<br>`c6id.12xlarge,r6id.4xlarge,`<br>`m6id.8xlarge"` | Selects instance types with small memory and fast network to snapshot within AWS's time limit during spot reclamation. |
-| `instance-types` | `"c6id.4xlarge,c6id.8xlarge,`<br>`r6id.2xlarge,m6id.4xlarge,`<br>`c6id.12xlarge,r6id.4xlarge,`<br>`m6id.8xlarge"` | Selects instance types with a small enough memory footprint and fast enough network to snapshot the machine within the time limit imposed by AWS during a spot reclamation event. |
+| `instance-types` | `"c6id.4xlarge,c6id.8xlarge,`<br>`r6id.2xlarge,m6id.4xlarge,`<br>`c6id.12xlarge,r6id.4xlarge,`<br>`m6id.8xlarge"` | Selects instance types with small memory and fast network to snapshot within AWS's time limit during spot reclamation. |
 | `max-cpus` | `1000` | Sets maximum number of CPUs for this compute environment |
 
-These options ensure your Fusion V2 compute environment is optimized for performance and cost-effectiveness.
+These options ensure your Fusion V2 compute environment is optimized for compatibility with the snapshot feature.
-These options ensure your Fusion V2 compute environment is optimized for compatibility with the snapshot feature.
+These options ensure your Fusion V2 compute environment is optimized.
-These options ensure your Fusion V2 compute environment is optimized for compatibility with the snapshot feature.
+These options ensure your Fusion V2 compute environment is optimized.
 
-#### 2. Plain S3 Compute Environment
+#### 3. Plain S3 Compute Environment
 
 Similarly, if we inspect the contents of [`aws_plain_s3.yml`](./compute-envs/aws_plain_s3.yml) as an example, we can see the overall structure is as follows:
 
@@ -169,7 +190,7 @@ We will additionally use process-level labels for further granularity, this is d
 To add labels to your compute environment:
 
 1. In the YAML file, locate the `labels` field. 
-2. Add your desired labels as a comma-separated list of key-value pairs. We have pre-populated this with the `storage=fusion|plains3` and `project=benchmarking` labels for better organization.
+2. Add your desired labels as a comma-separated list of key-value pairs. We have pre-populated this with the `storage=fusion|plains3` and `project=benchmarking` labels for better organization. If you have a pre-existing label, you can use this here as well. For example, if you have previously used the `project` label and it is activated in AWS, you could use `project=fusion_poc_plainS3CE` and `project=fusion_poc_fusionCE` to distinguish the two compute environments.
 
 ### Networking
 If your compute environments require custom networking setup using a custom VPC, subnets, and security groups, these can be added as additional YAML fields.

diff --git a/02_setup_compute/compute-envs/aws_fusion_nvme.yml b/02_setup_compute/compute-envs/aws_fusion_nvme.yml
@@ -9,6 +9,7 @@ compute-envs:
     wave: True
     fusion-v2: True
     fast-storage: True
+    snapshots: True
     no-ebs-auto-scale: True
     provisioning-model: "SPOT"
     instance-types: "c6id,m6id,r6id"

diff --git a/02_setup_compute/fusion_launch_template.txt b/02_setup_compute/fusion_launch_template.txt
@@ -0,0 +1,83 @@
+    MIME-Version: 1.0
+    Content-Type: multipart/mixed; boundary="//"
+
+    --//
+    Content-Type: text/cloud-config; charset="us-ascii"
+
+    #cloud-config
+    write_files:
+      - path: /root/tower-forge.sh
+        permissions: 0744
+        owner: root
+        content: |
+          #!/usr/bin/env bash
+          ## Stop the ECS agent if running
+          systemctl stop ecs
+          exec > >(tee /var/log/tower-forge.log|logger -t BatchForge -s 2>/dev/console) 2>&1
+          ##
+          yum install -q -y jq sed wget unzip nvme-cli lvm2
+          curl -s https://s3.amazonaws.com/amazoncloudwatch-agent/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm -o amazon-cloudwatch-agent.rpm
+          rpm -U ./amazon-cloudwatch-agent.rpm
+          rm -f ./amazon-cloudwatch-agent.rpm
+          curl -s https://nf-xpack.seqera.io/amazon-cloudwatch-agent/config-v0.4.json \
+            | sed 's/$FORGE_ID/ambry-example/g' \
+            > /opt/aws/amazon-cloudwatch-agent/bin/config.json
+          /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
+            -a fetch-config \
+            -m ec2 \
+            -s \
+            -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json
+          mkdir -p /scratch/fusion
+          NVME_DISKS=($(nvme list | grep 'Amazon EC2 NVMe Instance Storage' | awk '{ print $1 }'))
+          NUM_DISKS=$${#NVME_DISKS[@]}
+          if (( NUM_DISKS > 0 )); then
+            if (( NUM_DISKS == 1 )); then
+              mkfs -t xfs $${NVME_DISKS[0]}
+              mount $${NVME_DISKS[0]} /scratch/fusion
+            else
+              pvcreate $${NVME_DISKS[@]}
+              vgcreate scratch_fusion $${NVME_DISKS[@]}
+              lvcreate -l 100%FREE -n volume scratch_fusion
+              mkfs -t xfs /dev/mapper/scratch_fusion-volume
+              mount /dev/mapper/scratch_fusion-volume /scratch/fusion
+            fi
+          fi
+          chmod a+w /scratch/fusion
+          mkdir -p /etc/ecs
+          echo ECS_IMAGE_PULL_BEHAVIOR=once >> /etc/ecs/ecs.config
+          echo ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE=true >> /etc/ecs/ecs.config
+          echo ECS_ENABLE_SPOT_INSTANCE_DRAINING=true >> /etc/ecs/ecs.config
+          echo ECS_CONTAINER_CREATE_TIMEOUT=10m >> /etc/ecs/ecs.config
+          echo ECS_CONTAINER_START_TIMEOUT=10m >> /etc/ecs/ecs.config
+          echo ECS_CONTAINER_STOP_TIMEOUT=10m >> /etc/ecs/ecs.config
+          echo ECS_MANIFEST_PULL_TIMEOUT=10m >> /etc/ecs/ecs.config
+          systemctl stop docker
+          ## install AWS cli 2
+          curl "https://awscli.amazonaws.com/awscli-exe-linux-$(arch).zip" -o "awscliv2.zip"
+          unzip -q awscliv2.zip
+          sudo ./aws/install
+          TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
+          INSTANCEID=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/instance-id)
+          X_ZONE=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" -fs http://169.254.169.254/latest/meta-data/placement/availability-zone)
+          AWS_DEFAULT_REGION="`echo \"$X_ZONE\" | sed 's/[a-z]$//'`"
+          VOLUMEID=$(aws --region $AWS_DEFAULT_REGION ec2 describe-instances --instance-id $INSTANCEID | jq -r .Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs.VolumeId)
+          aws --region $AWS_DEFAULT_REGION ec2 modify-volume --volume-id $VOLUMEID --size 100 --volume-type gp3 --throughput 325
+          i=1; until [ "$(aws --region $AWS_DEFAULT_REGION ec2 describe-volumes-modifications --volume-id $VOLUMEID --filters Name=modification-state,Values="optimizing","completed" | jq '.VolumesModifications | length')" == "1" ] || [ $i -eq 256 ]; do
+            sleep $i
+            i=$(( i * 2 ))
+          done
+          if [ $i -eq 256 ]; then
+            echo "ERROR expanding EBS boot disk size"
+            aws --region $AWS_DEFAULT_REGION ec2 describe-volumes-modifications --volume-id $VOLUMEID
+          fi
+          growpart /dev/xvda 1
+          xfs_growfs -d /
+          systemctl start docker
+          systemctl enable --now --no-block ecs
+          echo "1258291200" > /proc/sys/vm/dirty_bytes
+          echo "629145600" > /proc/sys/vm/dirty_background_bytes
+
+    runcmd:
+      - bash /root/tower-forge.sh
+
+    --//--
diff --git a/03_setup_pipelines/README.md b/03_setup_pipelines/README.md
@@ -16,6 +16,7 @@
 - You have setup a Fusion V2 and plain S3 compute environment in the Seqera Platform in the [previous section](../02_setup_compute/README.md).
 - You have created an S3 bucket for saving the workflow outputs.
 - For effective use of resource labels, you have setup Split Cost Allocation tracking in your AWS account and activated the tags as mentioned in [this guide](../docs/assets/aws-split-cost-allocation-guide.md).
+  - **Exception**: In the event you cannot activate the resource labels we suggest here, but you can utilize existing resource labels, make sure you have set individual unique resource labels for both the plainS3 and Fusion at the compute environment level (See [02_setup_compute](../02_setup_compute/README.md#Appendix) for details)
 - If using private repositories, you have added your GitHub (or other VCS provider) credentials to the Seqera Platform workspace.
 - You have reviewed and updated the environment variables in [env.sh](../01_setup_environment/env.sh) to match your specific Platform setup.
 

diff --git a/03_setup_pipelines/pipelines/nextflow.config b/03_setup_pipelines/pipelines/nextflow.config
@@ -3,18 +3,7 @@ process {
         uniqueRunId: System.getenv("TOWER_WORKFLOW_ID"),
         pipelineProcess: task.process.toString(),
         pipelineTag: task.tag.toString(),
-        pipelineCPUs: task.cpus.toString(),
-        pipelineMemory: task.memory.toString(),
-        pipelineTaskAttempt: task.attempt.toString(),
-        pipelineContainer: task.container.toString(),
         taskHash: task.hash.toString(),
-        pipelineUser: workflow.userName.toString(),
-        pipelineRunName: workflow.runName.toString(),
         pipelineSessionId: workflow.sessionId.toString(),
-        pipelineResume: workflow.resume.toString(),
-        pipelineRevision: workflow.revision.toString(),
-        pipelineCommitId: workflow.commitId.toString(),
-        pipelineRepository: workflow.repository.toString(),
-        pipelineName: workflow.manifest.name.toString()
     ]}
 }
diff --git a/05_generate_report/README.md b/05_generate_report/README.md
@@ -59,9 +59,12 @@ The YAML configurations utilize environment variables defined in the `env.sh` fi
 Beside these environment variables, there are a few nextflow parameters that need to be configured based on your setup. Go directly in to `./pipelines/nextflow.config` and modify the following variables:
 
 1) If you are an enterprise customer, please change `seqera_api_endpoint` to your Seqera Platform deployment URL. The person who set up your Enterprise deployment will know this address.
+
 2) Set `benchmark_aws_cur_report` to the AWS CUR report containing the cost information for your runs. You can provide the direct S3 path to this file if your credentials in Seqera Platform have access to this file. Otherwise, please upload the parquet report to a S3 bucket accessible by the AWS credentials associated with your compute environment.
+
+> **Exception**: If you cannot use the resource labels we suggested, leave `benchmark_aws_cur_report` set to null and compile the report without task level costs. The cost comparison will be done at the pipeline level via your Cost Explorer access.
+
 > **Note**: If you are using a Seqera Platform Enterprise instance that is secured with a private CA SSL certificate not recognized by default Java certificate authorities, you will need to amend the params section in the [nf-aggregate.yml](../launch/nf-aggregate-launch.yml) file before running the above seqerakit command, to specify a custom cacerts store path through `--java_truststore_path` and optionally, a password with the `--java_truststore_password` pipeline parameters. This certificate will be used to achieve connectivity with your Seqera Platform instance through API and CLI.
-2) Set `benchmark_aws_cur_report` to the AWS CUR report containing your runs cost information. This can be the direct S3 link to this file if your credentials in Seqera Platform have access to this file, otherwise, please upload the parquet report to a bucket accesible by the AWS credentials associated with your compute environment.
 
 ### 4. Add the samplesheet to Seqera Platform
 To add the samplesheet to Seqera Platform, run the following command:

diff --git a/05_generate_report/pipelines/nextflow.config b/05_generate_report/pipelines/nextflow.config
@@ -4,5 +4,5 @@ params {
     seqera_api_endpoint = 'https://api.cloud.seqera.io'
     generate_benchmark_report = true
     benchmark_aws_cur_report = null
-    remove_failed_tasks = true
+    remove_failed_tasks = false
 }
diff --git a/05_generate_report/pre-run.txt b/05_generate_report/pre-run.txt