diff --git a/02_setup_compute/README.md b/02_setup_compute/README.md index 96cfe7f..c162bf6 100644 --- a/02_setup_compute/README.md +++ b/02_setup_compute/README.md @@ -2,7 +2,7 @@ This directory contains YAML configuration files for the creation of two compute environments: -- `aws_fusion_nvme.yml`: This compute environment is designed to run on Amazon Web Services (AWS) Batch and uses Fusion V2 with the 6th generation intel instance type with NVMe storage. +- `aws_fusion_nvme.yml`: This compute environment is designed to run on Amazon Web Services (AWS) Batch and uses Fusion V2 on SPOT instances with the 6th generation intel instance type with NVMe storage and the Fusion snapshot feature activated. Fusion snapshots is a new feature in Fusion that allows you to snapshot and restore your machine when a spot interruption occurs. - `aws_plain_s3.yml`: This compute environment is designed to run on Amazon Web Services (AWS) Batch and uses the plain AWS Batch with S3 storage. These YAML files provide best practice configurations for utilizing these two storage types in AWS Batch compute environments. The Fusion V2 configuration is tailored for high-performance workloads leveraging NVMe storage, while the plain S3 configuration offers a standard setup for comparison and workflows that don't require the advanced features of Fusion V2. @@ -24,6 +24,23 @@ These YAML files provide best practice configurations for utilizing these two st - You have an S3 bucket for the Nextflow work directory. - You have reviewed and updated the environment variables in [env.sh](../01_setup_environment/env.sh) to match your specific AWS setup. +### Using existing manual AWS queues in your compute environments + +#### Setting manual queues during CE creation with seqerakit + +In the event that you are not standing up your compute queues using Batch Forge but use a manual setup approach, you will need to modify your YAML configurations. You need to change `config-mode: forge` to `config-mode: manual` and add the following lines pointing to your specific queues to the YAML files. + +``` +head-queue: "myheadqueue-head" +compute-queue: "mycomputequeue-work" +``` + +Please note that in the case of manual queues the resource labels will have to be attached to your queues already and setting them on the Seqera Platform during CE creation when using manual queues will not work. + +#### Manually setting the launch template for Fusion + +If you are not using Batch Forge to set up your queues, you will also have to manually set the launch template for your instances in your fusion queues. To do this, add the launch template we provide [Fusion launch template](./fusion_launch_template.txt) to your AWS batch account, then clone your existing AWS compute environment and during the Instance configuration step, choose the fusion launch template you created. + ### YAML format description #### 1. Environment Variables in the YAML @@ -44,13 +61,13 @@ Using these variables allows easy customization of the compute environment confi #### 2. Fusion V2 Compute Environment -If we inspect the contents of [`aws_fusion_nvme.yml`](./compute-envs/aws_fusion_nvme.yml) as an example, we can see the overall structure is as follows: +Fusion snapshots is a new feature in Fusion that allows you to snapshot and restore your machine when a spot interruption occurs. If we inspect the contents of [`./compute-envs/aws_fusion_snapshots.yml`](./compute-envs/aws_fusion_snapshots.yml) as an example, we can see the overall structure is as follows: -```yaml +```YAML compute-envs: - type: aws-batch config-mode: forge - name: "$COMPUTE_ENV_PREFIX_fusion_nvme" + name: "${COMPUTE_ENV_PREFIX}_fusion_snapshots" workspace: "$ORGANIZATION_NAME/$WORKSPACE_NAME" credentials: "$AWS_CREDENTIALS" region: "$AWS_REGION" @@ -58,39 +75,37 @@ compute-envs: wave: True fusion-v2: True fast-storage: True + snapshots: True no-ebs-auto-scale: True provisioning-model: "SPOT" - instance-types: "c6id,m6id,r6id" + instance-types: "c6id.4xlarge,c6id.8xlarge,r6id.2xlarge,m6id.4xlarge,c6id.12xlarge,r6id.4xlarge,m6id.8xlarge" max-cpus: 1000 allow-buckets: "$AWS_COMPUTE_ENV_ALLOWED_BUCKETS" labels: storage=fusionv2,project=benchmarking" wait: "AVAILABLE" overwrite: False ``` -
-Click to expand: YAML format explanation - -The top-level block `compute-envs` mirrors the `tw compute-envs` command. The `type` and `config-mode` options are seqerakit specific. The nested options in the YAML correspond to options available for the Seqera Platform CLI command. For example, running `tw compute-envs add aws-batch forge --help` shows options like `--name`, `--workspace`, `--credentials`, etc., which are provided to the `tw compute-envs` command via this YAML definition. -
+Note: When setting `snapshots: True`, Fusion, Wave and fast-instance storage are required. We have set these to `true` here for documentation purposes and consistency. #### Pre-configured Options in the YAML -We've pre-configured several options to optimize your Fusion V2 compute environment: +We've pre-configured several options to optimize your Fusion snapshots compute environment: | Option | Value | Purpose | |--------|-------|---------| | `wave` | `True` | Enables Wave, required for Fusion in containerized workloads | | `fusion-v2` | `True` | Enables Fusion V2 | | `fast-storage` | `True` | Enables fast instance storage with Fusion v2 for optimal performance | +| `snapshots` | `True` | Enables automatic snapshot creation and restoration for spot instance interruptions | | `no-ebs-auto-scale` | `True` | Disables EBS auto-expandable disks (incompatible with Fusion V2) | | `provisioning-model` | `"SPOT"` | Selects cost-effective spot pricing model | -| `instance-types` | `"c6id,m6id,r6id"` | Selects 6th generation Intel instance types with high-speed local storage | +| `instance-types` | `"c6id.4xlarge,c6id.8xlarge,`
`r6id.2xlarge,m6id.4xlarge,`
`c6id.12xlarge,r6id.4xlarge,`
`m6id.8xlarge"` | Selects instance types with small memory and fast network to snapshot within AWS's time limit during spot reclamation. | | `max-cpus` | `1000` | Sets maximum number of CPUs for this compute environment | -These options ensure your Fusion V2 compute environment is optimized for performance and cost-effectiveness. +These options ensure your Fusion V2 compute environment is optimized. -#### 2. Plain S3 Compute Environment +#### 3. Plain S3 Compute Environment Similarly, if we inspect the contents of [`aws_plain_s3.yml`](./compute-envs/aws_plain_s3.yml) as an example, we can see the overall structure is as follows: @@ -169,7 +184,7 @@ We will additionally use process-level labels for further granularity, this is d To add labels to your compute environment: 1. In the YAML file, locate the `labels` field. -2. Add your desired labels as a comma-separated list of key-value pairs. We have pre-populated this with the `storage=fusion|plains3` and `project=benchmarking` labels for better organization. +2. Add your desired labels as a comma-separated list of key-value pairs. We have pre-populated this with the `storage=fusion|plains3` and `project=benchmarking` labels for better organization. If you have a pre-existing label, you can use this here as well. For example, if you have previously used the `project` label and it is activated in AWS, you could use `project=fusion_poc_plainS3CE` and `project=fusion_poc_fusionCE` to distinguish the two compute environments. ### Networking If your compute environments require custom networking setup using a custom VPC, subnets, and security groups, these can be added as additional YAML fields. diff --git a/02_setup_compute/compute-envs/aws_fusion_nvme.yml b/02_setup_compute/compute-envs/aws_fusion_nvme.yml index 49e272e..be4c7e6 100644 --- a/02_setup_compute/compute-envs/aws_fusion_nvme.yml +++ b/02_setup_compute/compute-envs/aws_fusion_nvme.yml @@ -9,6 +9,7 @@ compute-envs: wave: True fusion-v2: True fast-storage: True + snapshots: True no-ebs-auto-scale: True provisioning-model: "SPOT" instance-types: "c6id,m6id,r6id" diff --git a/02_setup_compute/fusion_launch_template.txt b/02_setup_compute/fusion_launch_template.txt new file mode 100644 index 0000000..5aa08ef --- /dev/null +++ b/02_setup_compute/fusion_launch_template.txt @@ -0,0 +1,83 @@ + MIME-Version: 1.0 + Content-Type: multipart/mixed; boundary="//" + + --// + Content-Type: text/cloud-config; charset="us-ascii" + + #cloud-config + write_files: + - path: /root/tower-forge.sh + permissions: 0744 + owner: root + content: | + #!/usr/bin/env bash + ## Stop the ECS agent if running + systemctl stop ecs + exec > >(tee /var/log/tower-forge.log|logger -t BatchForge -s 2>/dev/console) 2>&1 + ## + yum install -q -y jq sed wget unzip nvme-cli lvm2 + curl -s https://s3.amazonaws.com/amazoncloudwatch-agent/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm -o amazon-cloudwatch-agent.rpm + rpm -U ./amazon-cloudwatch-agent.rpm + rm -f ./amazon-cloudwatch-agent.rpm + curl -s https://nf-xpack.seqera.io/amazon-cloudwatch-agent/config-v0.4.json \ + | sed 's/$FORGE_ID/ambry-example/g' \ + > /opt/aws/amazon-cloudwatch-agent/bin/config.json + /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \ + -a fetch-config \ + -m ec2 \ + -s \ + -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json + mkdir -p /scratch/fusion + NVME_DISKS=($(nvme list | grep 'Amazon EC2 NVMe Instance Storage' | awk '{ print $1 }')) + NUM_DISKS=$${#NVME_DISKS[@]} + if (( NUM_DISKS > 0 )); then + if (( NUM_DISKS == 1 )); then + mkfs -t xfs $${NVME_DISKS[0]} + mount $${NVME_DISKS[0]} /scratch/fusion + else + pvcreate $${NVME_DISKS[@]} + vgcreate scratch_fusion $${NVME_DISKS[@]} + lvcreate -l 100%FREE -n volume scratch_fusion + mkfs -t xfs /dev/mapper/scratch_fusion-volume + mount /dev/mapper/scratch_fusion-volume /scratch/fusion + fi + fi + chmod a+w /scratch/fusion + mkdir -p /etc/ecs + echo ECS_IMAGE_PULL_BEHAVIOR=once >> /etc/ecs/ecs.config + echo ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE=true >> /etc/ecs/ecs.config + echo ECS_ENABLE_SPOT_INSTANCE_DRAINING=true >> /etc/ecs/ecs.config + echo ECS_CONTAINER_CREATE_TIMEOUT=10m >> /etc/ecs/ecs.config + echo ECS_CONTAINER_START_TIMEOUT=10m >> /etc/ecs/ecs.config + echo ECS_CONTAINER_STOP_TIMEOUT=10m >> /etc/ecs/ecs.config + echo ECS_MANIFEST_PULL_TIMEOUT=10m >> /etc/ecs/ecs.config + systemctl stop docker + ## install AWS cli 2 + curl "https://awscli.amazonaws.com/awscli-exe-linux-$(arch).zip" -o "awscliv2.zip" + unzip -q awscliv2.zip + sudo ./aws/install + TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600") + INSTANCEID=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/instance-id) + X_ZONE=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" -fs http://169.254.169.254/latest/meta-data/placement/availability-zone) + AWS_DEFAULT_REGION="`echo \"$X_ZONE\" | sed 's/[a-z]$//'`" + VOLUMEID=$(aws --region $AWS_DEFAULT_REGION ec2 describe-instances --instance-id $INSTANCEID | jq -r .Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs.VolumeId) + aws --region $AWS_DEFAULT_REGION ec2 modify-volume --volume-id $VOLUMEID --size 100 --volume-type gp3 --throughput 325 + i=1; until [ "$(aws --region $AWS_DEFAULT_REGION ec2 describe-volumes-modifications --volume-id $VOLUMEID --filters Name=modification-state,Values="optimizing","completed" | jq '.VolumesModifications | length')" == "1" ] || [ $i -eq 256 ]; do + sleep $i + i=$(( i * 2 )) + done + if [ $i -eq 256 ]; then + echo "ERROR expanding EBS boot disk size" + aws --region $AWS_DEFAULT_REGION ec2 describe-volumes-modifications --volume-id $VOLUMEID + fi + growpart /dev/xvda 1 + xfs_growfs -d / + systemctl start docker + systemctl enable --now --no-block ecs + echo "1258291200" > /proc/sys/vm/dirty_bytes + echo "629145600" > /proc/sys/vm/dirty_background_bytes + + runcmd: + - bash /root/tower-forge.sh + + --//-- \ No newline at end of file diff --git a/03_setup_pipelines/README.md b/03_setup_pipelines/README.md index f545b16..4425bf5 100644 --- a/03_setup_pipelines/README.md +++ b/03_setup_pipelines/README.md @@ -16,6 +16,7 @@ - You have setup a Fusion V2 and plain S3 compute environment in the Seqera Platform in the [previous section](../02_setup_compute/README.md). - You have created an S3 bucket for saving the workflow outputs. - For effective use of resource labels, you have setup Split Cost Allocation tracking in your AWS account and activated the tags as mentioned in [this guide](../docs/assets/aws-split-cost-allocation-guide.md). + - **Exception**: In the event you cannot activate the resource labels we suggest here, but you can utilize existing resource labels, make sure you have set individual unique resource labels for both the plainS3 and Fusion at the compute environment level (See [02_setup_compute](../02_setup_compute/README.md#Appendix) for details) - If using private repositories, you have added your GitHub (or other VCS provider) credentials to the Seqera Platform workspace. - You have reviewed and updated the environment variables in [env.sh](../01_setup_environment/env.sh) to match your specific Platform setup. diff --git a/03_setup_pipelines/pipelines/nextflow.config b/03_setup_pipelines/pipelines/nextflow.config index 5974752..2658fc1 100644 --- a/03_setup_pipelines/pipelines/nextflow.config +++ b/03_setup_pipelines/pipelines/nextflow.config @@ -3,18 +3,7 @@ process { uniqueRunId: System.getenv("TOWER_WORKFLOW_ID"), pipelineProcess: task.process.toString(), pipelineTag: task.tag.toString(), - pipelineCPUs: task.cpus.toString(), - pipelineMemory: task.memory.toString(), - pipelineTaskAttempt: task.attempt.toString(), - pipelineContainer: task.container.toString(), taskHash: task.hash.toString(), - pipelineUser: workflow.userName.toString(), - pipelineRunName: workflow.runName.toString(), pipelineSessionId: workflow.sessionId.toString(), - pipelineResume: workflow.resume.toString(), - pipelineRevision: workflow.revision.toString(), - pipelineCommitId: workflow.commitId.toString(), - pipelineRepository: workflow.repository.toString(), - pipelineName: workflow.manifest.name.toString() ]} } \ No newline at end of file diff --git a/05_generate_report/README.md b/05_generate_report/README.md index 3216694..5e5ef41 100644 --- a/05_generate_report/README.md +++ b/05_generate_report/README.md @@ -59,9 +59,12 @@ The YAML configurations utilize environment variables defined in the `env.sh` fi Beside these environment variables, there are a few nextflow parameters that need to be configured based on your setup. Go directly in to `./pipelines/nextflow.config` and modify the following variables: 1) If you are an enterprise customer, please change `seqera_api_endpoint` to your Seqera Platform deployment URL. The person who set up your Enterprise deployment will know this address. + 2) Set `benchmark_aws_cur_report` to the AWS CUR report containing the cost information for your runs. You can provide the direct S3 path to this file if your credentials in Seqera Platform have access to this file. Otherwise, please upload the parquet report to a S3 bucket accessible by the AWS credentials associated with your compute environment. + +> **Exception**: If you cannot use the resource labels we suggested, leave `benchmark_aws_cur_report` set to null and compile the report without task level costs. The cost comparison will be done at the pipeline level via your Cost Explorer access. + > **Note**: If you are using a Seqera Platform Enterprise instance that is secured with a private CA SSL certificate not recognized by default Java certificate authorities, you will need to amend the params section in the [nf-aggregate.yml](../launch/nf-aggregate-launch.yml) file before running the above seqerakit command, to specify a custom cacerts store path through `--java_truststore_path` and optionally, a password with the `--java_truststore_password` pipeline parameters. This certificate will be used to achieve connectivity with your Seqera Platform instance through API and CLI. -2) Set `benchmark_aws_cur_report` to the AWS CUR report containing your runs cost information. This can be the direct S3 link to this file if your credentials in Seqera Platform have access to this file, otherwise, please upload the parquet report to a bucket accesible by the AWS credentials associated with your compute environment. ### 4. Add the samplesheet to Seqera Platform To add the samplesheet to Seqera Platform, run the following command: diff --git a/05_generate_report/pipelines/nextflow.config b/05_generate_report/pipelines/nextflow.config index 84e8b00..b9eb3dd 100644 --- a/05_generate_report/pipelines/nextflow.config +++ b/05_generate_report/pipelines/nextflow.config @@ -4,5 +4,5 @@ params { seqera_api_endpoint = 'https://api.cloud.seqera.io' generate_benchmark_report = true benchmark_aws_cur_report = null - remove_failed_tasks = true + remove_failed_tasks = false } \ No newline at end of file diff --git a/05_generate_report/pre-run.txt b/05_generate_report/pre-run.txt deleted file mode 100644 index 31911b7..0000000 --- a/05_generate_report/pre-run.txt +++ /dev/null @@ -1 +0,0 @@ -export NXF_VER=24.10.4 \ No newline at end of file diff --git a/06_fusion_snapshots/00_env.md b/06_fusion_snapshots/00_env.md deleted file mode 100644 index 01342d8..0000000 --- a/06_fusion_snapshots/00_env.md +++ /dev/null @@ -1,96 +0,0 @@ -# Introduction to seqerakit - -This section sets up the environment variables for the Fusion snapshots benchmarking guide. - -### Prerequisites - -Before proceeding with this tutorial, ensure you have the following: - -1. Access to a Seqera Platform instance with: - - A [Workspace](https://docs.seqera.io/platform/23.3.0/orgs-and-teams/workspace-management) - - [Maintain](https://docs.seqera.io/platform/23.3.0/orgs-and-teams/workspace-management#participant-roles) user permissions or higher within the Workspace being used for benchmarking - - An [Access token](https://docs.seqera.io/platform/23.3.0/api/overview#authentication) for the Seqera Platform CLI - -2. Software dependencies installed as outlined in the [installation documentation](./installation.md): - - [`seqerakit >=0.4.3`](https://github.com/seqeralabs/seqera-kit#installation) - - [Seqera Platform CLI (`>=0.13.0`)](https://github.com/seqeralabs/tower-cli#1-installation) - - [Python (`>=3.8`)](https://www.python.org/downloads/) - - [PyYAML](https://pypi.org/project/PyYAML/) - - **Note** The Seqera Platform CLI version required for Fusion Snapshots. - -4. Basic familiarity with YAML file format and environment variables - -## Using seqerakit - -The [Seqera Platform CLI](https://github.com/seqeralabs/tower-cli) is an infrastructure-as-code (IaC) tool that allows you to define and reproduce your Seqera platform using configuration files. By using this approach, you can ensure consistency and scalability. The configuration is written in YAML, a format chosen for its simplicity and readability while remaining flexible enough to meet the needs of this tool. For this tutorial, we've provided all the relevant YAML files. - -At its core, `seqerakit` is designed to simplify access to the Seqera Platform CLI by allowing you to set command-line options within YAML configuration files. While you also have the option to launch `seqerakit` via [Python](https://github.com/seqeralabs/seqera-kit#launch-via-a-python-script), this tutorial will focus solely on the YAML-based configuration approach. - -### Dynamic settings - -`seqerakit` can evaluate **environment variables** defined within your YAML files. This approach adds a useful layer of flexibility, especially for settings that change often. By using environment variables, you can reuse the same YAML configuration across different contexts without hardcoding values. - -For example, in this tutorial, we will use an environment variable called `ORGANIZATION_NAME`. This allows you to easily specify the name of the organization you're using within your Seqera Platform instance, making it adaptable to different setups without modifying the YAML file itself. - -You can see this being used in the following example. Here, the pipeline will be launched in the workspace defined as `WORKSPACE_NAME` in the `ORGANIZATION_NAME`. - -```yaml -launch: - - name: "nf-hello-world" - workspace: "$ORGANIZATION_NAME/$WORKSPACE_NAME" - pipeline: "nf-hello-world" -``` - -## Define your environment variables - -In the next section, we will set the environment variables to your own settings. - -### Modify the `setup/env.sh` file - -All of the environment variables required for this tutorial have been pre-defined in [`setup/env.sh`](setup/env.sh). Edit this file directly and amend any entries labelled as `[CHANGE ME]` to customise them to align with your Seqera Platform instance. The following settings must be set to your target resources: - -- `ORGANIZATION_NAME`: Seqera Platform Organization name -- `WORKSPACE_NAME`: Seqera Platform Workspace name -- `COMPUTE_NAME_PREFIX`: A short prefix to be used for naming compute environments. This can be an informative prefix for the credentials/AWS account being used or location of the compute being used (e.g. "aws-virginia", "benchmark-virginia") -- `TIME`: Timestamp used by seqerakit to generate dynamic names for launched pipeline runs and their corresponding output directories. - -Modify the [`setup/env.sh`](setup/env.sh) file so these variables reflect your settings. Each variable must be exactly the same as your resources or the rest of the tutorial will not work. The final `env.sh` file should look like this. - -```bash -export ORGANIZATION_NAME=your_organization_name -export WORKSPACE_NAME=your_workspace_name -export COMPUTE_ENV_PREFIX=your_compute_env_prefix -export TIME=`date +"%Y%m%d-%H%M%S"` -``` - -#### Set the `TOWER_ACCESS_TOKEN` - -If you haven't already exported your `TOWER_ACCESS_TOKEN` via your `.bashrc` as highlighted in the [installation guide](./installation.md#access-token-all-customers), you can also add and `export` that in [`env.sh`](env.sh) using the command below: - -```bash -export TOWER_ACCESS_TOKEN= -``` - -#### Add the environment variables - -You will need to inject the environment variables you have defined in the previous step into the executing environment. At runtime, `seqerakit` will replace these environment variables with their values. - -You can run the following command in the root directory of this tutorial material: - -```bash -source ./setup/env.sh -``` - -You can check that the environment variables are available as expected by running: - -```bash -echo $ORGANIZATION_NAME -community -``` - -Note, this needs to be done in the same shell as the rest of the tutorial. If you close or otherwise reset the shell, the environment variables must be loaded again. - -## Next Steps - -After following this tutorial, you should have set all environment variables and they should be available in your shell. After completion of this step, [please proceed to the next step](./01_compute_envs.md). diff --git a/06_fusion_snapshots/01_compute_envs.md b/06_fusion_snapshots/01_compute_envs.md deleted file mode 100644 index eb20283..0000000 --- a/06_fusion_snapshots/01_compute_envs.md +++ /dev/null @@ -1,174 +0,0 @@ -# Fusion Snapshots - -Fusion snapshots is a new feature in Fusion that allows you to snapshot and restore your machine when a spot interruption occurs. - -This guide will give you a direct comparison between a compute environment with Fusion using on-demand instances and a compute environment with Fusion using spot instances that have been configured to use the snapshot feature. - -### Pre-requisites - -- You have access to the Seqera Platform. -- You have set up AWS credentials in the Seqera Platform workspace. - - Your AWS credentials have the correct IAM permissions if using [Batch Forge](https://docs.seqera.io/platform/24.1/compute-envs/aws-batch#batch-forge). -- You have an S3 bucket for the Nextflow work directory. -- You have reviewed and updated the environment variables in [env.sh](../01_setup_environment/env.sh) to match your specific AWS setup. -- You must have completed a Fusion benchmarking as per the rest of this guide. -- You must be enrolled on the Fusion snapshot preview program. - -### YAML format description - -#### 1. Environment Variables in the YAML - -The YAML configurations utilize environment variables defined in the [`env.sh`](./setup/env.sh) file. Here's a breakdown: - -| Variable | Description | Usage in YAML | -|----------|-------------|---------------| -| `$COMPUTE_ENV_PREFIX` | Prefix for compute environment name | `name` field | -| `$ORGANIZATION_NAME` | Seqera Platform organization | `workspace` field | -| `$WORKSPACE_NAME` | Seqera Platform workspace | `workspace` field | -| `$AWS_CREDENTIALS` | Name of AWS credentials | `credentials` field | -| `$AWS_REGION` | AWS region for compute | `region` field | -| `$AWS_WORK_DIR` | Path to Nextflow work directory | `work-dir` field | -| `$AWS_COMPUTE_ENV_ALLOWED_BUCKETS` | S3 buckets with read/write access | `allow-buckets` field | - -Using these variables allows easy customization of the compute environment configuration without directly modifying the YAML file, promoting flexibility and reusability. - -#### 2. Fusion V2 Compute Environment - -If we inspect the contents of [`./compute-envs/aws_fusion_ondemand.yml`](./compute-envs/aws_fusion_ondemand.yml) as an example, we can see the overall structure is as follows: - -```yaml -compute-envs: - - type: aws-batch - config-mode: forge - name: "${COMPUTE_ENV_PREFIX}_fusion_ondemand" - workspace: "$ORGANIZATION_NAME/$WORKSPACE_NAME" - credentials: "$AWS_CREDENTIALS" - region: "$AWS_REGION" - work-dir: "$AWS_WORK_DIR" - wave: True - fusion-v2: True - fast-storage: True - no-ebs-auto-scale: True - provisioning-model: "EC2" - instance-types: "c6id,m6id,r6id" - max-cpus: 1000 - allow-buckets: "$AWS_COMPUTE_ENV_ALLOWED_BUCKETS" - labels: storage=fusionv2,project=benchmarking" - wait: "AVAILABLE" - overwrite: False -``` -
-Click to expand: YAML format explanation - -The top-level block `compute-envs` mirrors the `tw compute-envs` command. The `type` and `config-mode` options are seqerakit specific. The nested options in the YAML correspond to options available for the Seqera Platform CLI command. For example, running `tw compute-envs add aws-batch forge --help` shows options like `--name`, `--workspace`, `--credentials`, etc., which are provided to the `tw compute-envs` command via this YAML definition. - -
- -#### 3. Fusion Snapshots Compute Environment - -Fusion snapshots is a new feature in Fusion that allows you to snapshot and restore your machine when a spot interruption occurs. If we inspect the contents of [`./compute-envs/aws_fusion_snapshots.yml`](./compute-envs/aws_fusion_snapshots.yml) as an example, we can see the overall structure is as follows: - -```YAML -compute-envs: - - type: aws-batch - config-mode: forge - name: "${COMPUTE_ENV_PREFIX}_fusion_snapshots" - workspace: "$ORGANIZATION_NAME/$WORKSPACE_NAME" - credentials: "$AWS_CREDENTIALS" - region: "$AWS_REGION" - work-dir: "$AWS_WORK_DIR" - wave: True - fusion-v2: True - fast-storage: True - snapshots: True - no-ebs-auto-scale: True - provisioning-model: "SPOT" - instance-types: "c6id.4xlarge,c6id.8xlarge,r6id.2xlarge,m6id.4xlarge,c6id.12xlarge,r6id.4xlarge,m6id.8xlarge" - max-cpus: 1000 - allow-buckets: "$AWS_COMPUTE_ENV_ALLOWED_BUCKETS" - labels: storage=fusionv2,project=benchmarking" - wait: "AVAILABLE" - overwrite: False -``` - -You should note it is very similar to the Fusion V2 compute environment, but with the following differences: - -- `provisioning-model` is set to `SPOT` to enable the use of spot instances. -- `snapshots` is set to True to allow Fusion to automatically restore a job if interrupted by spot reclamation -- `instance-types` are set to a very restrictive set of types that have sufficient memory and bandwidth to snapshot the machine within the time limit imposed by AWS during a spot reclamation event. - -Note: When setting `snapshots: True`, Fusion, Wave and fast-instance storage will be enabled by default for the CE. We have set these to `true` here for documentation purposes and consistency. - -#### Pre-configured Options in the YAML - -We've pre-configured several options to optimize your Fusion snapshots compute environment: - -| Option | Value | Purpose | -|--------|-------|---------| -| `wave` | `True` | Enables Wave, required for Fusion in containerized workloads | -| `fusion-v2` | `True` | Enables Fusion V2 | -| `fast-storage` | `True` | Enables fast instance storage with Fusion v2 for optimal performance | -| `snapshots` | `True` | Enables automatic snapshot creation and restoration for spot instance interruptions | -| `no-ebs-auto-scale` | `True` | Disables EBS auto-expandable disks (incompatible with Fusion V2) | -| `provisioning-model` | `"SPOT"` | Selects cost-effective spot pricing model | -| `instance-types` | `"c6id.4xlarge,c6id.8xlarge,`
`r6id.2xlarge,m6id.4xlarge,`
`c6id.12xlarge,r6id.4xlarge,`
`m6id.8xlarge"` | Selects instance types with a small enough memory footprint and fast enough network to snapshot the machine within the time limit imposed by AWS during a spot reclamation event. | -| `max-cpus` | `1000` | Sets maximum number of CPUs for this compute environment | - -These options ensure your Fusion V2 compute environment is optimized for compatibility with the snapshot feature. - -### Usage - -To fill in the details for each of the compute environments: - -1. Navigate to the `06_fusion_snapshots/compute-envs` directory. -2. Open the desired YAML file (`aws_fusion_ondemand.yml` or `aws_fusion_snapshots.yml`) in a text editor. -3. Review the details for each file. If you need to add: - - - Labels: See the [Labels](#labels) section. - - Networking: See the [Networking](#networking) section. - -4. Save the changes to each file. -5. Use these YAML files to create the compute environments in the Seqera Platform through seqerakit with the following commands. - - To create the Fusion V2 compute environment: - ```bash - seqerakit aws_fusion_ondemand.yml - ``` - - To create the Fusion Snapshots compute environment: - ```bash - seqerakit aws_fusion_snapshots.yml - ``` -6. Confirm your Compute Environments have been successfully created in the workspace and show a status of **'AVAILABLE'** indicating they are ready for use. - -## Appendix - -### Labels -Labels are name=value pairs that can be used to organize and categorize your AWS resources. In the context of our compute environments, labels can be useful for cost tracking and resource management. - -We will additionally use process-level labels for further granularity, this is described in the [03_setup_pipelines](../03_setup_pipelines/README.md) section. - -To add labels to your compute environment: - -1. In the YAML file, locate the `labels` field. -2. Add your desired labels as a comma-separated list of key-value pairs. We have pre-populated this with the `storage=fusion|plains3` and `project=benchmarking` labels for better organization. - -### Networking -If your compute environments require custom networking setup using a custom VPC, subnets, and security groups, these can be added as additional YAML fields. - -To add networking details to your compute environment: - -1. In the YAML files for both Fusion V2 and Plain S3, add the following fields, replacing the values with your networking details: - -```yaml - subnets: "subnet-aaaabbbbccccdddd1,subnet-aaaabbbbccccdddd2,subnet-aaaabbbbccccdddd3" - vpc-id: "vpc-aaaabbbbccccdddd" - security-groups: "sg-aaaabbbbccccdddd" -``` -**Note**: The values for your subnets, vpc-id and security groups must be a comma-separated string as shown above. - -2. Save your file and create your Compute Environments. - -## Next Steps - -Once this is completed, proceed to the [02_setup_pipelines](./02_setup_pipelines.md) section to setup your pipelines. diff --git a/06_fusion_snapshots/02_setup_pipelines.md b/06_fusion_snapshots/02_setup_pipelines.md deleted file mode 100644 index bef60fc..0000000 --- a/06_fusion_snapshots/02_setup_pipelines.md +++ /dev/null @@ -1,158 +0,0 @@ -## Setup pipelines on Seqera Platform - -### Table of contents -1. [Prerequisites](#1-prerequisites) -2. [Overview](#2-overview) -3. [Tutorial: Adding a test pipeline to the Launchpad](#3-tutorial-adding-a-test-pipeline-to-the-launchpad) - - [YAML format description](#yaml-format-description) - - [Environment variables in YAML](#1-environment-variables-in-yaml) - - [Pipeline YAML definition](#2-pipeline-yaml-definition) - - [Dry run mode](#dry-run-mode) - - [Adding the pipeline](#adding-the-pipeline) -4. [Add your workflow to the Launchpad](#4-add-your-workflow-to-the-launchpad) - -### 1. Prerequisites - -- You have setup a Fusion on-demand and Fusion snapshots compute environment in the Seqera Platform in the [previous section](./01_compute_envs.md). -- You have created an S3 bucket for saving the workflow outputs. -- For effective use of resource labels, you have setup Split Cost Allocation tracking in your AWS account and activated the tags as mentioned in [this guide](../docs/assets/aws-split-cost-allocation-guide.md). -- If using private repositories, you have added your GitHub (or other VCS provider) credentials to the Seqera Platform workspace. -- You have reviewed and updated the environment variables in [env.sh](../01_setup_environment/env.sh) to match your specific Platform setup. - -### 2. Overview - -This directory contains YAML configuration files to add your workflow to the Seqera Platform Launchpad, as well as add the [nextflow-io/hello](https://github.com/nextflow-io/hello) workflow to the Seqera Platform Launchpad: - -- `example_workflow_A_fusion_ondemand.yml`: This configuration is to setup your custom workflow for benchmarking to run on Fusion using on-demand instances. This workflow will use the `aws_fusion_ondemand` compute environment created in the [previous section](../02_setup_compute/README.md#3-fusion-snapshots-compute-environment). -- `example_workflow_A_fusion_snapshots.yml`: This configuration is to setup your custom workflow for benchmarking to run on Fusion with the experimental snapshot feature. This workflow will use the `aws_fusion_snapshots` compute environment created in the [previous section](../02_setup_compute/README.md#3-fusion-snapshots-compute-environment). -- `hello-world.yml`: This configuration is to setup the [nextflow-io/hello](https://github.com/nextflow-io/hello) workflow to run on the Seqera Platform. This workflow will use the `aws_fusion_nvme` compute environment created in the [previous section](../02_setup_compute/README.md#1-fusion-enabled-compute-environment). - -> **Note:** You can benchmark multiple workflows by copying and modifying these example configuration files. Simply create new YAML files for each workflow you want to test, adjusting the relevant parameters as needed. - -We can start by adding a simple Hello World pipeline to the Launchpad and then launching this in your chosen Workspace. This will ensure that `seqerakit` is working as expected and you are able to correctly add and launch a pipeline. - -### 3. Tutorial: Adding a test pipeline to the Launchpad - -Before we add our custom workflow to the Launchpad, let's start by adding the Hello World pipeline to the Launchpad as defined in [`hello_world_fusion_ondemand.yml`](./pipelines/hello_world_fusion_ondemand.yml) and [`hello_world_fusion_snapshots.yml`](./pipelines/hello_world_fusion_snapshots.yml). - -### YAML format description - -#### 1. Environment variables in YAML - -The YAML configurations utilize environment variables defined in the `env.sh` file. Here's a breakdown of the variables used in this context: - -| Variable | Description | Usage in YAML | -|----------|-------------|---------------| -| `$ORGANIZATION_NAME` | Seqera Platform organization | `workspace` field | -| `$WORKSPACE_NAME` | Seqera Platform workspace | `workspace` field | -| `$COMPUTE_ENV_PREFIX` | Prefix for compute environment name | `compute-env` field | -| `$PIPELINE_PROFILE` | Config profile to run your pipeline with | `profile` field | - -The `$PIPELINE_PROFILE` variable is defined in the `env.sh` file and can be used to specify a particular configuration profile for your pipeline. This allows you to easily switch between different sets of configuration options (e.g., 'test', 'production') without modifying the pipeline code or YAML files. - -#### 2. Pipeline YAML definition - -We can start by checking the YAML configuration file which defines the pipeline we will add to the workspace. The pipeline definition can be found at [`hello_world_fusion_ondemand.yml`](./pipelines/hello_world_fusion_ondemand.yml) and [`hello_world_fusion_snapshots.yml`](./pipelines/hello_world_fusion_snapshots.yml). Inspecting the contents here the file contains the following values: - -```yaml -pipelines: - - name: "nf-hello-world-test" - url: "https://github.com/nextflow-io/hello" - workspace: '$ORGANIZATION_NAME/$WORKSPACE_NAME' - description: "Classic Hello World script in Nextflow language." - compute-env: "${COMPUTE_ENV_PREFIX}_fusion_ondemand" - revision: "master" - overwrite: True -``` - -
-Click to expand: YAML format explanation - -The YAML file begins with a block starting with the key `pipelines` which mirrors the command available on the Seqera Platform CLI to add pipelines to the Launchpad i.e. `tw pipelines add`. To give you another example, if you would like to create a Compute Environment in the Seqera Platform, you would use the `tw add compute-envs` command and hence the `compute-envs` key in your YAML file, and so on. - -The nested options in the YAML also correspond to options available for that particular command on the Seqera Platform CLI. For example, if you run `tw pipelines add --help` you will see that `--name`, `--workspace`, `--description`, `--compute-env` and `--revision` are available as options, and will be provided to the `tw launch` command as defined in this YAML via `seqerakit`. However, other options defined in the YAML such as `url` and `overwrite` have been added specifically to extend the functionality in `seqerakit`. -
- -#### 3. Dry run mode - -Before we add the pipeline to the Launchpad let's run `seqerakit` in dry run mode. This will print the CLI commands that will be executed by `seqerakit` without actually deploying anything to the platform. - -Run the following command in the root directory of this tutorial material: - -```bash -seqerakit --dryrun ./pipelines/hello_world_fusion_ondemand.yml -``` - -You should see the following output appear in the shell: - -```shell -INFO:root:DRYRUN: Running command tw pipelines add --name nf-hello-world-test --workspace $ORGANIZATION_NAME/$WORKSPACE_NAME --description 'Classic Hello World script in Nextflow language.' --compute-env aws_fusion_ondemand --revision master https://github.com/nextflow-io/hello -``` - -This indicates seqerakit is interpreting the YAML file and is able to run some commands. Check the commands written to the console. Do they look reasonble? If so, we can proceed to the next step. - -#### 4. Adding the pipeline - -We will now add the pipeline to the Launchpad by removing the `--dryrun` option from the command-line: - -```bash -seqerakit ./pipelines/hello_world.yml -``` -Output will be like: - -```shell -DEBUG:root: Overwrite is set to 'True' for pipelines - -DEBUG:root: Running command: tw -o json pipelines list -w $ORGANIZATION_NAME/$WORKSPACE_NAME -DEBUG:root: Running command: tw pipelines add --name nf-hello-world-test --workspace $ORGANIZATION_NAME/$WORKSPACE_NAME --description 'Classic Hello World script in Nextflow language.' --compute-env aws_fusion_ondemand --revision master https://github.com/nextflow-io/hello -``` - -Go to the Launchpad page on your workspace on Seqera platform. You should see the hello world pipeline available to launch. - -![Hello World added to Launchpad](../docs/images/hello-world-pipelines-add.png) - -### 4. Add your own workflow to the Launchpad - -Add your workflow to the Launchpad by following these steps: - -1. Go to the `pipelines/` directory. -2. Edit `example_workflow_A_fusion_ondemand.yml` with your workflow details: - - - `url`: GitHub repository URL (ensure credentials are added for private repos) - - `description`: Brief workflow description - - `profile`: Workflow profile (e.g., `test` or `test_full` for nf-core/rnaseq) - - `revision`: Branch name, tag, or commit hash - - `params`: Workflow parameters (inline or via `params-file:`) - - `pre-run`: Path to pre-run script (optional) - - `labels`: Workflow labels for organization - - Other details are optional for customization. - - A few of the details have been set for you in the example workflows. These are to ensure that the workflow run is configured to run on the Seqera Platform with the appropriate compute environment. - - --- - - **_NOTE:_** We have [specified a local path](../03_setup_pipelines/pipelines/example_workflow_A_fusion.yml#L9) to a [Nextflow config file](./pipelines/nextflow.config) through the `config:` option. This config file includes custom configuration settings for attaching resource labels to each process in the workflow. These resource labels will attach metadata such as the unique run id, pipeline name, process name, and so on, to each task submitted to AWS Batch. - - ```json - process { - resourceLabels = {[ - uniqueRunId: System.getenv("TOWER_WORKFLOW_ID"), - pipelineProcess: task.process.toString(), - ... - ``` - - --- - -3. Save the file and close the text editor. Feel free to rename the file to something more descriptive of your workflow. -4. Use these YAML files to add your workflows to the Seqera Platform Launchpad by running the following command: - -```bash -seqerakit pipelines/example_workflow_A_*.yml -``` - -This will add all pipelines to the Seqera Platform Launchpad and you will be able to see it in the Launchpad UI. Confirm your pipelines have been added to the Launchpad before moving onto the next step of launching them. - -## Next Steps - -Once this is completed, proceed to the [03_launch](./03_launch.md) section to launch your workflows. diff --git a/06_fusion_snapshots/03_launch.md b/06_fusion_snapshots/03_launch.md deleted file mode 100644 index 23c9ceb..0000000 --- a/06_fusion_snapshots/03_launch.md +++ /dev/null @@ -1,177 +0,0 @@ -# Run workflows for benchmarking on Seqera Platform - -## Table of contents -1. [Prerequisites](#prerequisites) -2. [Overview](#overview) -3. [Launching hello workflow from the Launchpad](#launching-hello-workflow-from-the-launchpad) -4. [Run benchmarks for the custom workflow](#run-benchmarks-for-the-custom-workflow) - - [YAML format description](#yaml-format-description) - - [Launching the custom workflow](#launching-the-custom-workflow) - -### 1. Prerequisites - -- You have setup a Fusion on-demand and Fusion Snapshot compute environment in the Seqera Platform in the [previous section](../02_setup_compute/README.md). -- You have created an S3 bucket for saving the workflow outputs. -- You have created an S3 bucket containing the input samplesheet for the workflow or have uploaded the samplesheet to the [workspace as a Dataset](https://docs.seqera.io/platform/24.1/data/datasets). -- You have setup your custom and hello world workflows on the Launchpad as described in the [previous section](../03_setup_pipelines/README.md). - -### 2. Overview - -This directory contains YAML configuration files to launch the workflows on the Seqera Platform: - -- `example_workflow_A_fusion_snapshots.yml`: This configuration is to launch the custom workflow on the Seqera Platform with the Fusion snapshot compute environment. -- `example_workflow_A_fusion_ondemand.yml`: This configuration is to launch the custom workflow on the Seqera Platform with the Fusion on-demand compute environment. - -We will launch the hello world workflow from the Launchpad to ensure that the Seqera Platform is working as expected with both the Fusion snapshot and on-demand compute environments before running the benchmarks for the custom workflow. - -## 3. Launching hello workflow from the Launchpad - -We have provided separate YAML files [`hello_world_fusion_ondemand.yml`](../launch/hello_world_fusion_ondemand.yml) and [`hello_world_fusion_snapshots.yml`](../launch/hello_world_fusion_snapshots.yml) that contain the appropriate configuration to launch the Hello World pipeline we just added to the Launchpad. - -Theses YAML files will append the date through the `$TIME` variable set in `env.sh`. onto the run names. This can help with better organizing your benchmarking runs, especially if you launch multiple iterations. - -Use the command below to launch the pipelines with both compute environments: - -```shell -seqerakit ./launch/hello_world*.yml -``` - -```shell -DEBUG:root: Running command: tw launch nf-hello-world-fusion-ondemand-$TIME --name nf-hello-world-test --workspace $ORGANIZATION_NAME/$WORKSPACE_NAME -DEBUG:root: Running command: tw launch nf-hello-world-fusion-snapshots-$TIME --name nf-hello-world-test --workspace $ORGANIZATION_NAME/$WORKSPACE_NAME -``` - -When you check the running pipelines tab of your Seqera Platform workspace, you should now see the Hello World pipelines being submitted for execution. - -You may have to wait for the pipeline to begin executing and eventually complete. If you observe any failures, you will need to fix these systematically. If you don't, put your feet up and put the kettle on before moving on to the next step to run the benchmarks. - -## 4. Run benchmarks for the custom workflow - -Now that we have verified that the Seqera Platform is working as expected with both the Fusion on-demand and Fusion snapshots compute environments, we can run the benchmarks for the custom workflow. - -We will use the same workflow configuration files that we used in the [previous section](./02_setup_pipelines.md). - -### YAML format description - -#### 1. Environment Variables in YAML - -The YAML configurations utilize environment variables defined in the `env.sh` file. Here's a breakdown of the variables used in the example YAML: - -| Variable | Description | Usage in YAML | -|----------|-------------|---------------| -| `$TIME` | Current date and time | `name` field (appended to run name) | -| `$ORGANIZATION_NAME` | Seqera Platform organization | `workspace` field | -| `$WORKSPACE_NAME` | Seqera Platform workspace | `workspace` field | -| `$COMPUTE_ENV_PREFIX` | Prefix for compute environment name | `compute-env` field | -| `$PIPELINE_OUTDIR_PREFIX` | Prefix for pipeline output directory | `params.outdir` field | - -Using these variables allows easy customization of the launch configuration without directly modifying the YAML file, promoting flexibility and reusability across different environments and runs. - -If we inspect the contents of [`launch/example_workflow_A_fusion_ondemand.yml`](../launch/example_workflow_A_fusion_ondemand.yml) as an example, we can see the overall structure is the same as what we used when adding pipelines. - -#### 2. Pipeline YAML definition - -The YAML file for launching a pipeline follows a specific structure. Let's examine the key components of this structure using an example: - -```yaml -launch: - - name: "your_pipeline_name-$TIME-fusion-ondemand" - pipeline: "your_pipeline_name" - workspace: "$ORGANIZATION_NAME/$WORKSPACE_NAME" - compute-env: "${COMPUTE_ENV_PREFIX}_fusion_ondemand" - params: - outdir: '$PIPELINE_OUTDIR_PREFIX/your_pipeline_name/results' - input: 's3://your-bucket/input/samplesheet.csv' -``` - -
-Click to expand: YAML structure explanation - -The top-level block is now `launch` which mirrors the `tw launch` command available on the Seqera Platform CLI to launch pipelines from source or from the Launchpad. - -The nested options in the YAML also correspond to options available for that particular command on the Seqera Platform CLI. If you run `tw launch --help`, you will see that `--name`, `--workspace`, `--profile`, `--labels`, `--pre-run` and `--config` are available as options and will be provided to the `tw launch` command via this YAML definition. The `pipeline:` entry can be used to either specify the name of a pipeline that exists on the Launchpad, or a URL to a pipeline repository if running from source e.g. "https://github.com/nf-core/rnaseq". Here, we are using the pipeline name to launch the pipeline from the Launchpad that we setup earlier in the [previous section](../03_setup_pipelines/README.md). - -
- -#### Run Names and Parameters - -##### Run Names -- Run names are appended with datetime and storage type (e.g., fusion-ondemand, fusion-snapshots) -- This naming convention helps organize your runs -- Feel free to modify or add more information to run names as needed - -#### The `params` Section -- The `params` section in the YAML file is a `seqerakit`-specific option that allows you to define pipeline parameters directly within the YAML block, rather than in a separate file. -- This provides a convenient way to specify run-specific parameters within YAML -- For instance, many bioinformatics pipelines, including those from nf-core, use the `--outdir` parameter to specify where the final results should be stored. By including this in the `params` section of your YAML, you can easily set this for each run. -- If you've already defined pipeline parameters when you added the pipeline to the Launchpad, and you don't need to override or add any parameters for this specific run, you can omit the `params` section from your launch YAML file. - -
-Using Datasets as input - -> **Note** -> If you would like to use a Dataset as input, you can also include the URL to the dataset as your `input:` parameter. To do this, you can run the following CLI command to retrieve the URL: -> -> ```bash -> tw datasets url -n -w $ORGANIZATION_NAME/$WORKSPACE_NAME -> ``` -> This command will return a URL that you can then provide as the value for the `input:` parameter: -> -> ```yaml -> input: https://api.cloud.seqera.io/workspaces/138659136604200/datasets/7DPM3wJTa6zDROKw6SGFLg/v/2/n/rnaseq-samples.csv -> ``` - -
- -#### Additional Configuration Options - -You can specify local paths to customize your pipeline execution: - -1. **Nextflow config file**: Use the `config:` option -2. **Pre-run script**: Use the `pre-run:` option - -These files are provided as empty placeholders in the repository: - -
-Click to expand: Additional configuration options and pre-run script - -- They allow you to override specific options during benchmarking -- The options are commented out in the provided YAML files -- You can uncomment and use them as needed -- See the [Pipeline Configuration]() and [Pre-run Script]() section for more details - -
- -### 3. Launching the custom workflow - -We will now launch the custom workflow from the Launchpad using the YAML files we have defined in this repository. From the current directory, run the command below to launch the pipeline with the Fusion V2 compute environment: - -```bash -seqerakit launch/example_workflow_A_fusion_ondemand.yml -``` - -You should now see the custom workflow being submitted for execution in the Runs page of your Workspace on the Seqera Platform. - -Similarly, you can launch the pipeline with the Fusion snapshots compute environment by running the command below: - -```bash -seqerakit launch/example_workflow_A_fusion_snapshots.yml -``` - -Note, you can also specify paths to one or more named YAMLs present in the [`/launch`](./launch/) directory too to launch multiple pipelines in a single command: - -```bash -seqerakit launch/example_workflow_A_fusion_ondemand.yml launch/example_workflow_A_fusion_snapshots.yml -``` - -Even shorter, you can glob the YAML files to launch multiple pipelines in a single command: - -```bash -seqerakit launch/*.yml -``` - -You may have to wait for the pipeline to begin executing and eventually complete. If you observe any failures, you will need to fix these systematically. If you don't, put your feet up and put the kettle on before moving on to the next step to run the benchmarks. - -Before proceeding to the final part of this tutorial, ensure that the pipeline completes successfully at least once on both compute environments. Any failures may indicate infrastructure issues that should be addressed before attempting to run the pipeline on real-world datasets. For troubleshooting assistance, refer to the options in the [Support](../01_setup_environment/installation.md#support) section. - -After confirming successful runs, you can move on to the next section, [04_generate_report](./04_generate_report.md) where you will pull run metrics from the Seqera Platform. This will allow you to compare the performance of your custom workflow across the Fusion on-demand and Fusion snapshots compute environments. \ No newline at end of file diff --git a/06_fusion_snapshots/04_generate_report.md b/06_fusion_snapshots/04_generate_report.md deleted file mode 100644 index 6b3adb6..0000000 --- a/06_fusion_snapshots/04_generate_report.md +++ /dev/null @@ -1,153 +0,0 @@ -# Generate a Benchmarking Report from Pipeline Results - -## Table of Contents -1. [Prerequisites](#1-prerequisites) -2. [Overview](#2-overview) -3. [Generate the Report](#3-generate-the-report) - - [1. Fetch Run Dumps from Seqera Platform](#1-fetch-run-dumps-from-seqera-platform) - - [2. Generate a Samplesheet](#2-generate-a-samplesheet) - - [3. Compile the Benchmark Report](#3-compile-the-benchmark-report) -4. [Interpreting the Results](#4-interpreting-the-results) - ---- - -### 1. Prerequisites - -- You have successful executions of your Nextflow pipeline(s) with both Fusion on-demand and Fusion snapshots. -- You have access to the Seqera Platform workspace with the completed runs. -- You have access to an AWS cost and usage report (CUR) in parquet format, containing cost information for your benchmarking runs with the resource labels we have set up with you. - -### 2. Overview -To compile an interactive report comparing your Fusion on-demand and Fusion snapshots runs, we will utilize the [nf-aggregate](https://github.com/seqeralabs/nf-aggregate) pipeline developed by Seqera. This pipeline will fetch the detailed report logs for your runs directly from Seqera Platform using the [tw cli](https://github.com/seqeralabs/tower-cli) and generate a report using [Quarto](https://quarto.org/), an open-source publishing system, similar to RMarkdown in R. - -This containerized Nextflow pipeline requires a samplesheet that includes the workflow IDs, the workspace names where you runs were executed, and the grouping assignment for each run (either 'Fusion on-demand' or 'Fusion snapshots'). A template for the samplesheet is available in `nf_aggregate_samplesheet.csv`. - -``` -id,workspace,group -run_id_1,org/workspace,fusion_snapshots -run_id_2,org/workspace,fusion_ondemand -``` - -Below is an example from the community/showcase, featuring real workflow IDs and workspace declaration to illustrate the correct formatting. - -``` -id,workspace,group -3VcLMAI8wyy0Ld,community/showcase,group1 -4VLRs7nuqbAhDy,community/showcase,group2 -``` - -You can directly enter your information in `nf_aggregate_samplesheet.csv` and overwrite the template so you can plug and play the seqerakit scripts in this folder. - -### 3. Configuring nf-aggregate -Now that you have your samplesheet ready we will start the nf-aggregate run using seqerakit, similar to how we executed the benchmarking runs. For the seqerakit scripts in this folder, we assume you are reusing the already configured compute environment (CE) used for the fusion runs. - -This directory contains YAML configuration files to add nf-aggregate to your Seqera Platform Launchpad and use your configured samplesheet as input to nf-aggregate for compiling the benchmarking reports: - -- `datasets/nf-aggregate-dataset.yml` : This configuration will add the samplesheet for your benchmark runs as a dataset to Seqera Platform. -- `pipelines/nf-aggregate-pipeline.yml`: This configuration is to setup the nf-aggregate workflow for compiling benchmark reports from your workflow runs and your AWS cost and usage report (CUR). -- `launch/nf-aggregate-launch.yml`: This configuration is to launch the nf-aggregate workflow on the Seqera Platform with the Fusion V2 compute environment. - -The YAML configurations utilize environment variables defined in the `env.sh` file found in [01_setup_environment](../01_setup_environment/env.sh). Here's a breakdown of the variables used in this context: - -| Variable | Description | Usage in YAML | -|----------|-------------|---------------| -| `$ORGANIZATION_NAME` | Seqera Platform organization | `workspace` field | -| `$WORKSPACE_NAME` | Seqera Platform workspace | `workspace` field | -| `$COMPUTE_ENV_PREFIX` | Prefix for compute environment name | `compute-env` field | -| `$PIPELINE_OUTDIR_PREFIX` | Prefix for pipeline output directory | `params.outdir` field | -Beside these environment variables, there are a few nextflow parameters that need to be configured based on your setup. Go directly in to `./pipelines/nextflow.config` and modify the following variables: - -1) If you are an enterprise customer, please change `seqera_api_endpoint` to your Seqera Platform deployment URL. The person who set up your Enterprise deployment will know this address. -2) Set `benchmark_aws_cur_report` to the AWS CUR report containing the cost information for your runs. You can provide the direct S3 path to this file if your credentials in Seqera Platform have access to this file. Otherwise, please upload the parquet report to a S3 bucket accessible by the AWS credentials associated with your compute environment. -> **Note**: If you are using a Seqera Platform Enterprise instance that is secured with a private CA SSL certificate not recognized by default Java certificate authorities, you will need to amend the params section in the [nf-aggregate.yml](../launch/nf-aggregate-launch.yml) file before running the above seqerakit command, to specify a custom cacerts store path through `--java_truststore_path` and optionally, a password with the `--java_truststore_password` pipeline parameters. This certificate will be used to achieve connectivity with your Seqera Platform instance through API and CLI. -2) Set `benchmark_aws_cur_report` to the AWS CUR report containing your runs cost information. This can be the direct S3 link to this file if your credentials in Seqera Platform have access to this file, otherwise, please upload the parquet report to a bucket accesible by the AWS credentials associated with your compute environment. - -### 4. Add the samplesheet to Seqera Platform -To add the samplesheet to Seqera Platform, run the following command: - -```shell -seqerakit -j ./report/datasets/nf-aggregate-dataset.yml -``` -This will return a JSON object with the dataset name, workspace, and dataset ID. - -### 5. Retrieve the Dataset URL -You will need the Dataset URL from the Platform in order to launch nf-aggregate from the command-line. Use the following command to get the Dataset URL and create a new environment variable called $DATASET_URL: - -```shell -export DATASET_URL=$(tw -o json datasets url --name fusion-benchmark-samplesheet --workspace $ORGANIZATION_NAME/$WORKSPACE_NAME | jq .datasetUrl | tr -d '"') -``` -You can double-check the value of $DATASET_URL with the command below: - -```shell -echo $DATASET_URL - -# Example output: -# https://api.cloud.seqera.io/workspaces/100452700310173/datasets/4f2d6orAHG5j7YY1DQtEzP/v/1/n/nf_aggregate_samplesheet.csv - -``` - -### 6. Add nf-aggregate to the Launchpad - -Add nf-aggregate with the proper configuration to the launchpad and then launch nf-aggregate, we will use seqerakit as we have done for running the benchmarks. - -This configuration includes an `input` parameter that is set to the `DATASET_URL` environment variable. - -```shell -seqerakit ./report/pipelines/nf-aggregate-pipeline.yml -``` - -### 7. Launch nf-aggregate -Once the pipeline is added to the Launchpad, you can launch nf-aggregate from the Launchpad. - -This configuration includes an `outdir` parameter that is set to the `PIPELINE_OUTDIR_PREFIX` environment variable to store the reports generated by the nf-aggregate run. - -```shell -seqerakit ./report/launch/nf-aggregate-launch.yml -``` - -This will launch nf-aggregate to compile the benchmarking report. Once the pipeline finishes, head to the 'Runs' tab to download the HTML report. - -### 8. Interpreting the results - -The benchmark report includes various sections, ranging from high-level comparisons to detailed task-level analyses. Each section contains comments and explanations to guide you through the results. - -Upon completion, please share a copy of the report with the Seqera team to discuss the findings and walk you through the details. - -To learn more about each section of the report, see the [Appendix](#report-sections). - -## Appendix - -### Report sections - -
-Benchmark overview -This section provides a general overview of the pipeline run IDs used in the report for each group. If a `runUrl` is found in the logs, the run IDs will be clickable links. Please note that access to the specific Seqera Platform deployment and workspace is required for these links to work. -
-
- -
-Run overview -This section contains detailed information about the runs included in the report. It features a sortable and filterable table with technical details such as version numbers for pipelines and Nextflow, as well as information about the compute environment setup. Below the table, bar plots provide a visual comparison of key performance characteristics at the pipeline level. - -- **Accurate compute cost**: The total expense incurred by Nextflow tasks for AWS elastic compute instances (EC2) consumed during workflow execution, including both actively used and idle but allocated resources, retrieved from the AWS cost and usage report. This does not include cost for the Nextflow head job or any costs other than EC2 (S3 transfer costs, VPC costs, FSx costs etc.). -- **Accurate used cost**: The cost of vCPU and memory resources that were actually allocated to and consumed by Amazon ECS tasks during the workflow execution period. -- **Accurate unused cost**: The cost of vCPU and memory resources that were allocated to EC2 instances but remained unutilized by ECS tasks during the workflow execution period. This represents capacity that was reserved and paid for but was not used. -- **Total run time**: The cumulative execution duration across all Nextflow tasks used in the workflow, calculated by summing the run time of all individual tasks. -- **CPU efficiency**: The percentage of allocated CPU resources that were actively utilized during task execution, calculated as (CPU time consumed / CPU time allocated) × 100%. Higher percentages indicate better utilization of provisioned CPU capacity. -- **Memory efficiency**: The percentage of allocated memory that was actively used during task execution, calculated as (memory consumed / memory allocated) × 100%. Higher percentages indicate better utilization of provisioned memory resources. - -
-
- -
-Process overview -This section presents an overview of run times, combining both staging time and real execution time for all processes. It displays the mean run time, with one standard deviation range around the mean for each task. -
-
-
-Task overview -This section provides insights into instance type usage and task staging and execution times. - -- **Task Instance Usage**: This subsection shows the number of tasks that ran on different instance types during pipeline runs, allowing for quick comparisons of instance type usage between groups. Users can hover over the stacked bar plots to view the detailed distribution of instance types and can use the legend to highlight or hide specific instance types. - - - **Task metrics**: The plots show pairwise correlations between the plainS3 run and the Fusion run for both staging time (staging in and staging out) and real execution time. The dashed diagonal line represents perfect correlation between the two runs, meaning that if the tasks in both runs were exactly the same, all points would lie on the diagonal line. \ No newline at end of file diff --git a/06_fusion_snapshots/README.md b/06_fusion_snapshots/README.md deleted file mode 100644 index 42656a0..0000000 --- a/06_fusion_snapshots/README.md +++ /dev/null @@ -1,63 +0,0 @@ -# Fusion Snapshot Benchmarking - -## Introduction - -The aim of this tutorial is to perform a standardized benchmarking of the Fusion snapshots feature in the Seqera Platform. - -The guide is similar to the Fusion benchmarking guide performed previosuly but focuses on testing the Fusion snapshots feature. - -## Overview - -This tutorial has been split up into 4 main components that you will need to complete in order: - -1. [Set up compute environments](./01_compute_envs.md) -2. [Set up pipelines](./02_setup_pipelines.md) -3. [Launch workflows](./03_launch.md) -4. [Generate report](./04_generate_report.md) - -## Prerequisites - -You must have already completed testing Fusion and determined it is suitable for your use case before starting this guide. Snapshots is an experimental feature and errors may occur, it is important to isolate these from the use of Fusion or any other settings. - -## Preparation - -Before starting this tutorial, ensure you have the following prerequisites in place: - -1. Access to a Seqera Platform instance with: - - A [Workspace](https://docs.seqera.io/platform/23.3.0/orgs-and-teams/workspace-management) - - [Maintain](https://docs.seqera.io/platform/23.3.0/orgs-and-teams/workspace-management#participant-roles) user permissions or higher within the Workspace - - An [Access token](https://docs.seqera.io/platform/23.3.0/api/overview#authentication) for the Seqera Platform CLI - -2. Software dependencies installed: - - [`seqerakit >=0.5.2`](https://github.com/seqeralabs/seqera-kit#installation) - - [Seqera Platform CLI (`>=0.13.0`)](https://github.com/seqeralabs/tower-cli#1-installation) - - [Python (`>=3.8`)](https://www.python.org/downloads/) - - [PyYAML](https://pypi.org/project/PyYAML/) - - **Note** The Seqera Platform CLI version required for Fusion Snapshots. - - Before continuing with the tutorial, please refer to the [installation guide](docs/installation.md) to ensure you have access to all of the required software dependencies and established connectivity to the Seqera Platform via the `seqerakit` command-line interface. - -3. AWS resources, data and configurations: - - AWS credentials set up in the Seqera Platform workspace - - Correct IAM permissions for [Batch Forge](https://docs.seqera.io/platform/24.1/compute-envs/aws-batch#batch-forge) (if using) - - An S3 bucket for the Nextflow work directory - - An S3 bucket for saving workflow outputs - - An S3 bucket containing the input samplesheet (or uploaded to the [workspace as a Dataset](https://docs.seqera.io/platform/24.1/data/datasets)) - - Split Cost Allocation tracking set up in your AWS account with activated tags (see [this guide](./docs/assets/aws-split-cost-allocation-guide.md)) - - **Note**: Ensure that the `taskHash` label has also been activated. The guide was recently amended to include this label to enable retrieval of task costs for each unique hash without relying on the task names themselves. - -4. If using private repositories, add your GitHub (or other VCS provider) credentials to the Seqera Platform workspace - -5. Familiarity with: - - Basic YAML file format - - Environment variables - - Linux command line and common shell operations - - Seqera Platform and its features - -After ensuring all these prerequisites are met, you'll be ready to proceed with the tutorial steps for setting up and running infrastructure benchmarks on the Seqera Platform. - -### Continue - -When you are ready to proceed, please go to the [next section](./01_compute_envs.md) to set up the compute environments. diff --git a/06_fusion_snapshots/compute-envs/aws_fusion_ondemand.yml b/06_fusion_snapshots/compute-envs/aws_fusion_ondemand.yml deleted file mode 100644 index 2cfe0b1..0000000 --- a/06_fusion_snapshots/compute-envs/aws_fusion_ondemand.yml +++ /dev/null @@ -1,19 +0,0 @@ -compute-envs: - - type: aws-batch - config-mode: forge - name: "${COMPUTE_ENV_PREFIX}_fusion_ondemand" - workspace: "$ORGANIZATION_NAME/$WORKSPACE_NAME" - credentials: "$AWS_CREDENTIALS" - region: "$AWS_REGION" - work-dir: "$AWS_WORK_DIR" - wave: True - fusion-v2: True - fast-storage: True - no-ebs-auto-scale: True - provisioning-model: "EC2" - instance-types: "c6id,m6id,r6id" - max-cpus: 1000 - allow-buckets: "$AWS_COMPUTE_ENV_ALLOWED_BUCKETS" - labels: storage=fusionv2,project=benchmarking - wait: "AVAILABLE" - overwrite: False diff --git a/06_fusion_snapshots/compute-envs/aws_fusion_snapshots.yml b/06_fusion_snapshots/compute-envs/aws_fusion_snapshots.yml deleted file mode 100644 index cb7f624..0000000 --- a/06_fusion_snapshots/compute-envs/aws_fusion_snapshots.yml +++ /dev/null @@ -1,20 +0,0 @@ -compute-envs: - - type: aws-batch - config-mode: forge - name: "${COMPUTE_ENV_PREFIX}_fusion_snapshots" - workspace: "$ORGANIZATION_NAME/$WORKSPACE_NAME" - credentials: "$AWS_CREDENTIALS" - region: "$AWS_REGION" - work-dir: "$AWS_WORK_DIR" - wave: True - fusion-v2: True - fast-storage: True - snapshots: True - no-ebs-auto-scale: True - provisioning-model: "SPOT" - instance-types: "c6id.4xlarge,c6id.8xlarge,r6id.2xlarge,m6id.4xlarge,c6id.12xlarge,r6id.4xlarge,m6id.8xlarge" - max-cpus: 1000 - allow-buckets: "$AWS_COMPUTE_ENV_ALLOWED_BUCKETS" - labels: storage=fusionv2,project=benchmarking - wait: "AVAILABLE" - overwrite: False diff --git a/06_fusion_snapshots/launch/example_workflow_A_fusion_ondemand.yml b/06_fusion_snapshots/launch/example_workflow_A_fusion_ondemand.yml deleted file mode 100644 index 078db98..0000000 --- a/06_fusion_snapshots/launch/example_workflow_A_fusion_ondemand.yml +++ /dev/null @@ -1,11 +0,0 @@ -launch: - - name: "your_pipeline_name-$TIME-fusion" # Required, specify a name for your run - pipeline: "your_pipeline_name" # Required, name of your pipeline on the Launchpad - workspace: "$ORGANIZATION_NAME/$WORKSPACE_NAME" # Required, specify a workspace - compute-env: "${COMPUTE_ENV_PREFIX}_fusion_ondemand" # Required, selects for Fusion CE created in earlier step - # pre-run: "./pre-run.sh" # Optional, specify a pre-run script - # params-file: "./params.yaml" # Optional, specify a pipeline params file - params: # Optional, specify params inline - outdir: '$PIPELINE_OUTDIR_PREFIX/your_pipeline_name/results' # Optional, specify an output directory inline - input: 's3://your-bucket/input/samplesheet.csv' # Optional, specify an input samplesheet inline - \ No newline at end of file diff --git a/06_fusion_snapshots/launch/example_workflow_A_fusion_snapshots.yml b/06_fusion_snapshots/launch/example_workflow_A_fusion_snapshots.yml deleted file mode 100644 index 662d703..0000000 --- a/06_fusion_snapshots/launch/example_workflow_A_fusion_snapshots.yml +++ /dev/null @@ -1,11 +0,0 @@ -launch: - - name: "your_pipeline_name-$TIME-fusion-snapshot" # Required, specify a name for your run - pipeline: "your_pipeline_name" # Required, name of your pipeline on the Launchpad - workspace: "$ORGANIZATION_NAME/$WORKSPACE_NAME" # Required, specify a workspace - compute-env: "${COMPUTE_ENV_PREFIX}_fusion_snapshots" # Required, selects for Fusion CE created in earlier step - # pre-run: "./pre-run.sh" # Optional, specify a pre-run script - # params-file: "./params.yaml" # Optional, specify a pipeline params file - params: # Optional, specify params inline - outdir: '$PIPELINE_OUTDIR_PREFIX/your_pipeline_name/results' # Optional, specify an output directory inline - input: 's3://your-bucket/input/samplesheet.csv' # Optional, specify an input samplesheet inline - \ No newline at end of file diff --git a/06_fusion_snapshots/launch/hello_world_fusion_ondemand.yml b/06_fusion_snapshots/launch/hello_world_fusion_ondemand.yml deleted file mode 100644 index 2368a9c..0000000 --- a/06_fusion_snapshots/launch/hello_world_fusion_ondemand.yml +++ /dev/null @@ -1,5 +0,0 @@ -launch: - - name: "nf-hello-world-fusion-ondemand-$TIME" - workspace: "$ORGANIZATION_NAME/$WORKSPACE_NAME" - compute-env: "${COMPUTE_ENV_PREFIX}_fusion_ondemand" - pipeline: "nf-hello-world-test" diff --git a/06_fusion_snapshots/launch/hello_world_fusion_snapshots.yml b/06_fusion_snapshots/launch/hello_world_fusion_snapshots.yml deleted file mode 100644 index 7a3dfeb..0000000 --- a/06_fusion_snapshots/launch/hello_world_fusion_snapshots.yml +++ /dev/null @@ -1,5 +0,0 @@ -launch: - - name: "nf-hello-world-fusion-snapshots-$TIME" - workspace: "$ORGANIZATION_NAME/$WORKSPACE_NAME" - compute-env: "${COMPUTE_ENV_PREFIX}_fusion_snapshots" - pipeline: "nf-hello-world-test" diff --git a/06_fusion_snapshots/pipelines/example_workflow_A_fusion_ondemand.yml b/06_fusion_snapshots/pipelines/example_workflow_A_fusion_ondemand.yml deleted file mode 100644 index 1bfcded..0000000 --- a/06_fusion_snapshots/pipelines/example_workflow_A_fusion_ondemand.yml +++ /dev/null @@ -1,14 +0,0 @@ -pipelines: - - name: "your_pipeline_name" # Required, specify a name for pipeline on Launchpad - url: "your_pipeline_url" # Required, specify a pipeline URL - workspace: "$ORGANIZATION_NAME/$WORKSPACE_NAME" # Required, specify a workspace - description: "your_pipeline_description" # Optional, specify a pipeline description - compute-env: "${COMPUTE_ENV_PREFIX}_fusion_ondemand" # Required, selects for Fusion CE created in earlier step - profile: "$PIPELINE_PROFILE" # Optional, specify a test profile - revision: "your_pipeline_revision" # Required, specify a pipeline revision - config: "./nextflow.config" # Required, specify a config file with process resource labels for costs - # pre-run: "./pre-run.sh" # Optional, specify a pre-run script - # params-file: "./params.yaml" # Optional, specify a pipeline params file - params: # Optional, specify params inline - input: 's3://your-bucket/input/samplesheet.csv' - labels: "fusion_benchmark,fusionondemand" # Optional, specify pipeline labels to organize runs \ No newline at end of file diff --git a/06_fusion_snapshots/pipelines/example_workflow_A_fusion_snapshots.yml b/06_fusion_snapshots/pipelines/example_workflow_A_fusion_snapshots.yml deleted file mode 100644 index 65c930f..0000000 --- a/06_fusion_snapshots/pipelines/example_workflow_A_fusion_snapshots.yml +++ /dev/null @@ -1,14 +0,0 @@ -pipelines: - - name: "your_pipeline_name" # Required, specify a name for pipeline on Launchpad - url: "your_pipeline_url" # Required, specify a pipeline URL - workspace: "$ORGANIZATION_NAME/$WORKSPACE_NAME" # Required, specify a workspace - description: "your_pipeline_description" # Optional, specify a pipeline description - compute-env: "${COMPUTE_ENV_PREFIX}_fusion_snapshots" # Required, selects for Fusion CE created in earlier step - profile: "$PIPELINE_PROFILE" # Optional, specify a test profile - revision: "your_pipeline_revision" # Required, specify a pipeline revision - config: "./nextflow_snapshots.config" # Required, specify a config file with process resource labels for costs - # pre-run: "./pre-run.sh" # Optional, specify a pre-run script - # params-file: "./params.yaml" # Optional, specify a pipeline params file - params: # Optional, specify params inline - input: 's3://your-bucket/input/samplesheet.csv' - labels: "fusion_benchmark,fusionv2" # Optional, specify pipeline labels to organize runs \ No newline at end of file diff --git a/06_fusion_snapshots/pipelines/hello_world.yml b/06_fusion_snapshots/pipelines/hello_world.yml deleted file mode 100644 index 10cf6ed..0000000 --- a/06_fusion_snapshots/pipelines/hello_world.yml +++ /dev/null @@ -1,8 +0,0 @@ -pipelines: - - name: "nf-hello-world-test" - url: "https://github.com/nextflow-io/hello" - workspace: '$ORGANIZATION_NAME/$WORKSPACE_NAME' - description: "Classic Hello World script in Nextflow language." - compute-env: "${COMPUTE_ENV_PREFIX}_fusion_snapshots" - revision: "master" - overwrite: True diff --git a/06_fusion_snapshots/pipelines/nextflow.config b/06_fusion_snapshots/pipelines/nextflow.config deleted file mode 100644 index 5974752..0000000 --- a/06_fusion_snapshots/pipelines/nextflow.config +++ /dev/null @@ -1,20 +0,0 @@ -process { - resourceLabels = {[ - uniqueRunId: System.getenv("TOWER_WORKFLOW_ID"), - pipelineProcess: task.process.toString(), - pipelineTag: task.tag.toString(), - pipelineCPUs: task.cpus.toString(), - pipelineMemory: task.memory.toString(), - pipelineTaskAttempt: task.attempt.toString(), - pipelineContainer: task.container.toString(), - taskHash: task.hash.toString(), - pipelineUser: workflow.userName.toString(), - pipelineRunName: workflow.runName.toString(), - pipelineSessionId: workflow.sessionId.toString(), - pipelineResume: workflow.resume.toString(), - pipelineRevision: workflow.revision.toString(), - pipelineCommitId: workflow.commitId.toString(), - pipelineRepository: workflow.repository.toString(), - pipelineName: workflow.manifest.name.toString() - ]} -} \ No newline at end of file diff --git a/06_fusion_snapshots/pipelines/nextflow_snapshots.config b/06_fusion_snapshots/pipelines/nextflow_snapshots.config deleted file mode 100644 index 221d332..0000000 --- a/06_fusion_snapshots/pipelines/nextflow_snapshots.config +++ /dev/null @@ -1,25 +0,0 @@ -process { - resourceLabels = {[ - uniqueRunId: System.getenv("TOWER_WORKFLOW_ID"), - pipelineProcess: task.process.toString(), - pipelineTag: task.tag.toString(), - pipelineCPUs: task.cpus.toString(), - pipelineMemory: task.memory.toString(), - pipelineTaskAttempt: task.attempt.toString(), - pipelineContainer: task.container.toString(), - taskHash: task.hash.toString(), - pipelineUser: workflow.userName.toString(), - pipelineRunName: workflow.runName.toString(), - pipelineSessionId: workflow.sessionId.toString(), - pipelineResume: workflow.resume.toString(), - pipelineRevision: workflow.revision.toString(), - pipelineCommitId: workflow.commitId.toString(), - pipelineRepository: workflow.repository.toString(), - pipelineName: workflow.manifest.name.toString() - ]} -} - -aws.batch.maxSpotAttempts = 5 - -// Resource limits set to match the largest available instance type (m6id.8xlarge: 32 cores, 128GB memory) -process.resourceLimits = [cpus: 32, memory: '60.GB', time: '16.h'] \ No newline at end of file diff --git a/06_fusion_snapshots/pipelines/pre-run.sh b/06_fusion_snapshots/pipelines/pre-run.sh deleted file mode 100644 index e69de29..0000000 diff --git a/06_fusion_snapshots/report/dataset/nf-aggregate-dataset.yml b/06_fusion_snapshots/report/dataset/nf-aggregate-dataset.yml deleted file mode 100644 index 4d9b1f9..0000000 --- a/06_fusion_snapshots/report/dataset/nf-aggregate-dataset.yml +++ /dev/null @@ -1,7 +0,0 @@ -datasets: - - name: 'fusion-snapshots-benchmark-samplesheet' - description: 'Samplesheet with run information to compile Fusion snapshots benchmarking report.' - header: true - workspace: "$ORGANIZATION_NAME/$WORKSPACE_NAME" - file-path: './nf_aggregate_samplesheet.csv' - overwrite: True \ No newline at end of file diff --git a/06_fusion_snapshots/report/launch/nf-aggregate-launch.yml b/06_fusion_snapshots/report/launch/nf-aggregate-launch.yml deleted file mode 100644 index b4731ce..0000000 --- a/06_fusion_snapshots/report/launch/nf-aggregate-launch.yml +++ /dev/null @@ -1,6 +0,0 @@ -launch: - - name: 'nf-aggregate-fusion-snapshots-benchmark-report' - workspace: "$ORGANIZATION_NAME/$WORKSPACE_NAME" - pipeline: 'nf-aggregate-fusion-snapshots-benchmark' - params: - outdir: '$PIPELINE_OUTDIR_PREFIX/nf-aggregate-fusion-snapshots-benchmark/results' diff --git a/06_fusion_snapshots/report/nf_aggregate_samplesheet.csv b/06_fusion_snapshots/report/nf_aggregate_samplesheet.csv deleted file mode 100644 index 010b311..0000000 --- a/06_fusion_snapshots/report/nf_aggregate_samplesheet.csv +++ /dev/null @@ -1,3 +0,0 @@ -id,workspace,group -run_id_1,org/workspace,fusion_snapshots -run_id_2,org/workspace,fusion_ondemand \ No newline at end of file diff --git a/06_fusion_snapshots/report/pipelines/nextflow.config b/06_fusion_snapshots/report/pipelines/nextflow.config deleted file mode 100644 index b9eb3dd..0000000 --- a/06_fusion_snapshots/report/pipelines/nextflow.config +++ /dev/null @@ -1,8 +0,0 @@ -process.maxRetries = 1 - -params { - seqera_api_endpoint = 'https://api.cloud.seqera.io' - generate_benchmark_report = true - benchmark_aws_cur_report = null - remove_failed_tasks = false -} \ No newline at end of file diff --git a/06_fusion_snapshots/report/pipelines/nf-aggregate-pipeline.yml b/06_fusion_snapshots/report/pipelines/nf-aggregate-pipeline.yml deleted file mode 100644 index 2184ad7..0000000 --- a/06_fusion_snapshots/report/pipelines/nf-aggregate-pipeline.yml +++ /dev/null @@ -1,12 +0,0 @@ -pipelines: - - name: "nf-aggregate-snapshots-fusion-benchmark" - url: "https://github.com/seqeralabs/nf-aggregate" - workspace: "$ORGANIZATION_NAME/$WORKSPACE_NAME" - description: "seqeralabs/nf-aggregate is a Nextflow pipeline to aggregate pertinent metrics across pipeline runs on the Seqera Platform." - compute-env: "${COMPUTE_ENV_PREFIX}_fusion_snapshots" - revision: "0.7.0" - config: "./pipelines/nextflow.config" - params: # Optional, specify params inline - input: '$DATASET_URL' - labels: "fusion_snapshots_benchmark" - pre-run: './pre-run.txt' diff --git a/06_fusion_snapshots/report/pre-run.txt b/06_fusion_snapshots/report/pre-run.txt deleted file mode 100644 index 31911b7..0000000 --- a/06_fusion_snapshots/report/pre-run.txt +++ /dev/null @@ -1 +0,0 @@ -export NXF_VER=24.10.4 \ No newline at end of file diff --git a/06_fusion_snapshots/setup/env.sh b/06_fusion_snapshots/setup/env.sh deleted file mode 100644 index d1847ce..0000000 --- a/06_fusion_snapshots/setup/env.sh +++ /dev/null @@ -1,15 +0,0 @@ -# Workspace details -export ORGANIZATION_NAME=seqeralabs -export WORKSPACE_NAME=scidev-aws ## [CHANGE ME] Seqera Platform Workspace name - -# Pipeline details -export PIPELINE_OUTDIR_PREFIX="s3://scidev-playground-eu-west-2/snapshots_benchmark" ## [CHANGE ME] Pipeline results will be written to a subfolder of this path. You can set it the work directory defined in your Compute Environment -export PIPELINE_PROFILE='test' ## [OPTIONAL - CHANGE ME] Config profile to run your pipeline with -export TIME=$(date +"%Y%m%d-%H%M%S") - -# AWS Compute Environment details -export COMPUTE_ENV_PREFIX=snapshots_benchmark ## [CHANGE ME] Informative prefix for naming new CEs, this could include the project name and region -export AWS_CREDENTIALS='aws-scidev-playground' ## [CHANGE ME] Name of the AWS credentials added to your Workspace -export AWS_REGION='eu-west-2' ## [CHANGE ME] AWS Batch region for compute -export AWS_WORK_DIR='s3://scidev-playground-eu-west-2' ## [CHANGE ME] Path to the default Nextflow work directory -export AWS_COMPUTE_ENV_ALLOWED_BUCKETS="s3://scidev-playground-eu-west-2" ## [CHANGE ME] List of allowed S3 buckets that you want to enable r/w to diff --git a/README.md b/README.md index e028fdf..fe6b05a 100644 --- a/README.md +++ b/README.md @@ -22,10 +22,6 @@ This tutorial has been split up into 6 main components that you will need to com 3. [Setup pipelines for benchmarking](03_setup_pipelines/README.md) 4. [Run benchmarks](04_run_benchmarks/README.md) 5. [Generate benchmarking reports](05_generate_report/README.md) -6. [Fusion Snapshot benchmarking](./06_fusion_snapshots/README.md) (Optional) - -**Note:** The Fusion Snapshot benchmarking is an optional advanced feature. Contact your Seqera team if you are interested in testing Fusion Snapshots. - ## Preparation