diff --git a/platform-cloud/docs/compute-envs/aws-batch.md b/platform-cloud/docs/compute-envs/aws-batch.md index d61cd4c2a..26efc4ee2 100644 --- a/platform-cloud/docs/compute-envs/aws-batch.md +++ b/platform-cloud/docs/compute-envs/aws-batch.md @@ -14,14 +14,62 @@ The AWS Batch service quota for job queues is 50 per account. For more informati There are two ways to create a Seqera Platform compute environment for AWS Batch: -- [**Automatic**](#automatic-configuration-of-batch-resources): This option lets Seqera automatically create the required AWS Batch resources in your AWS account, using an internal tool with Seqera Platform called "Forge". This removes the need to set up your AWS Batch infrastructure manually. Resources are also automatically deleted when the compute environment is removed from Platform. -- [**Manual**](#manual-configuration-of-batch-resources): This option lets Seqera use existing AWS Batch resources created manually. +- [**Automatically**](#automatic-configuration-of-batch-resources): this option lets Seqera automatically create the required AWS Batch resources in your AWS account, using an internal tool within Seqera Platform called "Forge". This removes the need to set up your AWS Batch infrastructure manually. Resources are also automatically deleted when the compute environment is removed from Platform. +- [**Manually**](#manual-configuration-of-batch-resources): this option lets Seqera use existing AWS Batch resources previously created. -Both options require specific IAM permissions to function correctly. +Both options require specific IAM permissions to function correctly, as well as access to an S3 bucket or EFS/FSx file system to store intermediate Nextflow files. + +## S3 bucket creation + +AWS S3 (Simple Storage Service) is a type of **object storage**. To access input and output files using Seqera products like [Studios](../studios/overview) and [Data Explorer](../data/data-explorer) create one or more **S3 buckets**. An S3 bucket can also be used to store intermediate results of your Nextflow pipelines, as an alternative to using EFS or FSx file systems. +:::note +Using EFS as the Platform work directory is incompatible with Studios. +::: + +1. Navigate to the [AWS S3 service](https://console.aws.amazon.com/s3/home). +1. In the top right of the page, select the same region where you plan to create your AWS Batch compute environment. +1. Select **Create bucket**. +1. Enter a unique name for your bucket. +1. Leave the rest of the options as default and select **Create bucket**. + +:::note +S3 can be used by Nextflow for the storage of intermediate files. In production pipelines, this can amount to a lot of data. To reduce costs, consider using a retention policy when creating a bucket, such as automatically deleting intermediate files after 30 days. See the [AWS documentation](https://aws.amazon.com/premiumsupport/knowledge-center/s3-empty-bucket-lifecycle-rule/) for more information. +::: + +## EFS or FSx file system creation + +[AWS Elastic File System (EFS)](https://aws.amazon.com/efs/) and [AWS FSx](https://aws.amazon.com/fsx/) are types of **file storage** that can be used as a Nextflow work directory to store intermediate files, as an alternative to using S3 buckets. + +:::note +Using EFS as the Platform work directory is incompatible with Studios. +::: + +To use EFS or FSx as your Nextflow work directory, create an EFS or FSx file system in the same region where you plan to create your AWS Batch compute environment, to avoid high data transfer costs. + +### Creating an EFS file system + +Visit the [EFS console](https://console.aws.amazon.com/efs/home) to create a new EFS file system. + +1. Select **Create file system**. +1. Optionally give it a name, then select the VPC where your AWS Batch compute environment will be created. +1. Leave the rest of the options as default and select **Create file system**. + +### Creating an FSx file system + +Visit the [FSx console](https://console.aws.amazon.com/fsx/home) to create a new FSx file system. + +1. Select **Create file system**. +1. Select the desired FSx file system type (e.g., FSx for Lustre) +1. Follow the prompts to configure the file system according to your requirements, then select **Next**. +1. Review the configuration and select **Create file system**. + +Make sure the [Lustre client](https://docs.aws.amazon.com/fsx/latest/LustreGuide/install-lustre-client.html) is available in the AMIs used by your AWS Batch compute environment to allow mounting FSx file systems. ## Required Platform IAM permissions -To create and launch pipelines or Studio sessions with this compute environment type, you must provide an IAM user with specific permissions. Some permissions are mandatory for the compute environment to be created and function correctly, while others are optional and used for example to provide list of values to pick from in the Platform UI. +To create and launch pipelines, explore buckets with Data Explorer or run Studio sessions with the AWS Batch compute environment, an IAM user with specific permissions must be provided. Some permissions are mandatory for the compute environment to be created and function correctly, while others are optional and used for example to provide list of values to pick from in the Platform UI. + +Permissions can be attached directly to an [IAM user](#iam-user-creation), or to an [IAM role](#iam-role-creation-optional) that the IAM user can assume when accessing AWS resources. A permissive and broad policy with all the required permissions is provided here for a quick start. However, we recommend following the principle of least privilege and only granting the necessary permissions for your use case, as shown in the following sections. @@ -351,7 +399,11 @@ The policy can be scoped down to only allow limited Read/Write permissions in ce } ``` -### IAM roles creation (optional) +:::note +If you opted to create a separate S3 bucket only for Nextflow work directories, there is no need for the IAM user to have access to it: if Platform is allowed to manage resources (using Batch Forge) the IAM roles automatically created will have the necessary permissions; if you set up the compute environment manually, you can create the required IAM roles with the necessary permissions as detailed in the [manual AWS Batch setup documentation](../enterprise/advanced-topics/manual-aws-batch-setup). +::: + +### IAM roles for AWS Batch (optional) Seqera can automatically create the IAM roles needed to interact with AWS Batch and other AWS services. You can opt out of this behavior by creating the required IAM roles manually and providing their ARNs during compute environment creation in Platform: refer to the [documentation](../enterprise/advanced-topics/manual-aws-batch-setup) for more details on how to manually set up IAM roles. @@ -459,7 +511,7 @@ This section of the policy is optional and can be omitted if EFS file systems ar ### Pipeline secrets (optional) -Seqera can synchronize [pipeline secrets](../secrets/overview) defined on the Platform workspace with AWS Secrets Manager, which requires additional permissions on the IAM User. If you do not plan to use pipeline secrets, you can omit this section of the policy. +Seqera can synchronize [pipeline secrets](../secrets/overview) defined on the Platform workspace with AWS Secrets Manager, which requires additional permissions on the IAM user. If you do not plan to use pipeline secrets, you can omit this section of the policy. The listing of secrets cannot be restricted, but the management actions can be restricted to only allow managing secrets in a specific account and region, which must be the same region where the pipeline runs. Note that Seqera only creates secrets with the `tower-` prefix. @@ -486,107 +538,125 @@ The listing of secrets cannot be restricted, but the management actions can be r To successfully use pipeline secrets, the IAM roles manually created must follow the steps detailed in the [documentation](../secrets/overview#aws-secrets-manager-integration). -## Automatic configuration of Batch resources - -Seqera automates the configuration of an [AWS Batch](https://aws.amazon.com/batch/) compute environment and the queues required for deploying Nextflow pipelines. +## Create the IAM policy -:::caution -Seqera automatically creates resources that you may be charged for in your AWS account. See [Cloud costs](../monitoring/cloud-costs) for guidelines to manage cloud resources effectively and prevent unexpected costs. -::: +The policy above must be created in the AWS account where the AWS Batch resources need to be created. -### IAM - -Batch Forge requires an Identity and Access Management (IAM) user with the permissions listed in [this policy file](https://github.com/seqeralabs/nf-tower-aws/blob/master/forge/forge-policy.json). These authorizations are more permissive than those required to only [launch](https://github.com/seqeralabs/nf-tower-aws/blob/master/launch/launch-policy.json) a pipeline, since Seqera needs to manage AWS resources on your behalf. Note that launch permissions also require the S3 storage write permissions in [this policy file](https://github.com/seqeralabs/nf-tower-aws/blob/master/forge/README.md#s3-access-optional). - -We recommend that you create separate IAM policies for Batch Forge and launch permissions using the policy files above. These policies can then be assigned to the Seqera IAM user. - -**Create Seqera IAM policies** - -1. Open the [AWS IAM console](https://console.aws.amazon.com/iam). +1. Open the [AWS IAM console](https://console.aws.amazon.com/iam) in the account where you want to create the AWS Batch resources. 1. From the left navigation menu, select **Policies** under **Access management**. 1. Select **Create policy**. -1. On the **Create policy** page, select the **JSON** tab. - -1. Copy the contents of your policy JSON file ([Forge](https://github.com/seqeralabs/nf-tower-aws/blob/master/forge/forge-policy.json) or [Launch](https://github.com/seqeralabs/nf-tower-aws/blob/master/launch/launch-policy.json), depending on the policy being created) and replace the default text in the policy editor area under the JSON tab. - -1. To create a Launch user, you must also create the [S3 bucket write policy](https://github.com/seqeralabs/nf-tower-aws/blob/master/forge/README.md#s3-access-optional) separately to attach to your Launch user. - -1. To use Data Explorer and Studios, you must create the [data policy](https://github.com/seqeralabs/nf-tower-aws/blob/master/forge/README.md#aws-batch-management) separately to attach to your Platform users. - -1. Select **Next: Tags**. -1. Select **Next: Review**. -1. Enter a name and description for the policy on the **Review policy** page, then select **Create policy**. -1. Repeat these steps for both the `forge-policy.json` and `launch-policy.json` files. For a Launch user, also create the `s3-bucket-write-policy.json` listed in step 5 above. - -**Create an IAM user** +1. On the **Policy editor** section, select the **JSON** tab. +1. Following the instructions detailed in the [IAM permissions breakdown section](#required-platform-iam-permissions) replace the default text in the policy editor area under the **JSON** tab with a policy adapted to your use case, then select **Next**. +1. Enter a name and description for the policy on the **Review and create** page, then select **Create policy**. + +## IAM user creation + +Seqera requires an Identity and Access Management (IAM) User to create and manage AWS Batch resources in your AWS account. + +In certain scenarios, for example when multiple users need to access the same AWS account and provision AWS Batch resources, an IAM role with the required permissions can be created instead, and the IAM user can assume that role when accessing AWS resources, as detailed in the [IAM role creation (optional)](#iam-role-creation-optional) section. + +Depending whether you choose to let Seqera automatically create the required AWS Batch resources in your account, or prefer to set them up manually, the IAM user must have specific permissions as detailed in the [Required Platform IAM permissions](#required-platform-iam-permissions) section. Alternatively, you can create an IAM role with the required permissions and allow the IAM user to assume that role when accessing AWS resources, as detailed in the [IAM role creation (optional)](#iam-role-creation-optional) section. + +### Create an IAM user + +1. From the [AWS IAM console](https://console.aws.amazon.com/iam), select **Users** in the left navigation menu, then select **Create User** at the top right of the page. +1. Enter a name for your user (e.g., _seqera_) and select **Next**. +1. Under **Permission options**, select **Attach policies directly**, then search for and select the policy created above, and select **Next**. + * If you prefer to make the IAM user assume a role to manage AWS resources (see the [IAM role creation (optional)](#iam-role-creation-optional) section), create a policy with the following content (edit the AWS principal with the ARN of the role created) and attach it to the IAM user: + + ```json + { + "Sid": "AssumeRoleToManageBatchResources", + "Effect": "Allow", + "Action": "sts:AssumeRole", + "Resource": "arn:aws:iam:::role/" + } + ``` +1. On the last page, review the user details and select **Create user**. + +The user has now been created. The most up-to-date instructions for creating an IAM user can be found in the [AWS documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html). + +### Obtain IAM user credentials + +To get the credentials needed to connect Seqera to your AWS account, follow these steps: + +1. From the [AWS IAM console](https://console.aws.amazon.com/iam), select **Users** in the left navigation menu, then select the newly created user from the users table. +1. Select the **Security credentials** tab, then select **Create access key** under the **Access keys** section. +1. In the **Use case** dialog that appears, select **Command line interface (CLI)**, then tick the confirmation checkbox at the bottom to acknowledge that you want to proceed creating an access key, and select **Next**. +1. Optionally provide a description for the access key, like the reason for creating it, then select **Create access key**. +1. Save the **Access key** and **Secret access key** in a secure location as you will need to provide them when creating credentials in Seqera. + +## IAM role creation (optional) + +Rather than attaching permissions directly to the IAM user, you can create an IAM role with the required permissions and allow the IAM user to assume that role when accessing AWS resources. This is useful when multiple IAM users are used to access the same AWS account: this way the actual permissions to operate on the resources are only granted to a single centralized role. + +1. From the [AWS IAM console](https://console.aws.amazon.com/iam), select **Roles** in the left navigation menu, then select **Create role** at the top right of the page. +1. Select **Custom trust policy** as the type of trusted entity, provide the following policy and edit the AWS principal with the ARN of the IAM user created in the [IAM user creation](#iam-user-creation) section, then select **Next**. + ```json + { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Principal": { + "AWS": [ + "arn:aws:iam:::user/" + ] + }, + "Action": "sts:AssumeRole" + } + ] + } + ``` +1. On the **Permissions** page, search for and select the policy created in the [IAM user creation](#iam-user-creation) section, then select **Next**. +1. Give the role a name and optionally a description, review the details of the role, optionally provide tags to help you identify the role, then select **Create role**. + +Multiple users can be specified in the trust policy by adding more ARNs to the `Principal` section. -1. From the [AWS IAM console](https://console.aws.amazon.com/iam), select **Users** in the left navigation menu, then select **Add User** at the top right of the page. -1. Enter a name for your user (e.g., _seqera_) and select the **Programmatic access** type. -1. Select **Next: Permissions**. -1. Select **Next: Tags > Next: Review > Create User**. - :::note - For the time being, you can ignore the "user has no permissions" warning. Permissions will be applied using the **IAM Policy**. - ::: -1. Save the **Access key ID** and **Secret access key** in a secure location as you will use these when creating credentials in Seqera. -1. After you have saved the keys, select **Close**. -1. Back in the users table, select the newly created user, then select **Add permissions** under the **Permissions** tab. -1. Select **Attach existing policies**, then search for and select each of the policies created above. -1. Select **Next: Review > Add permissions**. - -### S3 Bucket - -S3 (Simple Storage Service) is a type of **object storage**. To access files and store the results for your pipelines, create an **S3 bucket** that your Seqera IAM user can access. - -**Create an S3 bucket** +## Automatic configuration of Batch resources -1. Navigate to the [S3 service](https://console.aws.amazon.com/s3/home). -1. Select **Create New Bucket**. -1. Enter a unique name for your bucket and select a region. - :::note - To maximize data transfer resilience and minimize cost, storage should be in the same region as compute. - ::: -1. Select the default options in **Configure options**. -1. Select the default options in **Set permissions**. -1. Review and select **Create bucket**. +Seqera automates the configuration of an [AWS Batch](https://aws.amazon.com/batch/) compute environment and the queues required for deploying Nextflow pipelines. -:::note -S3 is used by Nextflow for the storage of intermediate files. In production pipelines, this can amount to a lot of data. To reduce costs, consider using a retention policy when creating a bucket, such as automatically deleting intermediate files after 30 days. See [here](https://aws.amazon.com/premiumsupport/knowledge-center/s3-empty-bucket-lifecycle-rule/) for more information. +:::caution +AWS Batch creates resources that you may be charged for in your AWS account. See [Cloud costs](../monitoring/cloud-costs) for guidelines to manage cloud resources effectively and prevent unexpected costs. ::: -### Batch Forge compute environment - -Batch Forge automates the configuration of an [AWS Batch](https://aws.amazon.com/batch/) compute environment and the queues required to deploy Nextflow pipelines. After your IAM user and S3 bucket have been set up, create a new **AWS Batch** compute environment in Seqera. - -#### Created resources +### AWS Batch +Seqera automates the configuration of an [AWS Batch](https://aws.amazon.com/batch/) compute environment and the queues required to deploy Nextflow pipelines. After your IAM user and S3 bucket have been set up, create a new **AWS Batch** compute environment in Seqera. -Batch Forge will create the head and compute [job queues](https://docs.aws.amazon.com/batch/latest/userguide/job_queues.html) and their respective [compute environments](https://docs.aws.amazon.com/batch/latest/userguide/compute_environments.html) where jobs will be executed. The job queues are configured with [job state limit actions](https://docs.aws.amazon.com/batch/latest/APIReference/API_JobStateTimeLimitAction.html) to automatically purge jobs that cannot be scheduled on any node type available for the compute environment. +#### Create a Seqera AWS Batch compute environment -Depending on the provided configuration, Forge might also create IAM roles for Nextflow head job execution, EFS or FSx filesystems, and CloudWatch log groups. +Seqera will create the head and compute [job queues](https://docs.aws.amazon.com/batch/latest/userguide/job_queues.html) and their respective [compute environments](https://docs.aws.amazon.com/batch/latest/userguide/compute_environments.html) where jobs will be executed. The job queues are configured with [job state limit actions](https://docs.aws.amazon.com/batch/latest/APIReference/API_JobStateTimeLimitAction.html) to automatically purge jobs that cannot be scheduled on any node type available for the compute environment. +Depending on the provided configuration in the UI, Seqera might also create IAM roles for Nextflow head job execution, CloudWatch log groups, EFS or FSx filesystems, etc. -**Create a Batch Forge AWS Batch compute environment** - -1. In a workspace, select **Compute environments > New environment**. +1. After logging in to [Seqera](https://cloud.seqera.io) and selecting a workspace from the dropdown menu at the top of the page, select **Compute environments** from the navigation menu. +1. Select **Add compute environment**. 1. Enter a descriptive name for this environment, e.g., _AWS Batch Spot (eu-west-1)_. 1. Select **AWS Batch** as the target platform. -1. From the **Credentials** drop-down, select existing AWS credentials, or select **+** to add new credentials. If you're using existing credentials, skip to step 8. +1. From the **Credentials** drop-down, select existing AWS credentials, or select **+** to add new credentials. If you're using existing credentials, skip to step 9. :::note You can create multiple credentials in your Seqera environment. See [Credentials](../credentials/overview). ::: 1. Enter a name, e.g., _AWS Credentials_. -1. Add the **Access key** and **Secret key**. These are the keys you saved previously when you created the Seqera IAM user. +1. Add the **Access key** and **Secret key** you [previously obtained](#obtain-iam-user-credentials) when you created the Seqera IAM user. 1. (Optional) Under **Assume role**, specify the IAM role to be assumed by the Seqera IAM user to access the compute environment's AWS resources. :::note - When using AWS keys without an assumed role, the associated AWS user account must have [Launch](https://github.com/seqeralabs/nf-tower-aws/tree/master/launch) and [Forge](https://github.com/seqeralabs/nf-tower-aws/tree/master/forge) permissions. When an assumed role is provided, the keys are only used to retrieve temporary credentials impersonating the role specified. In this case, [Launch](https://github.com/seqeralabs/nf-tower-aws/tree/master/launch) and [Forge](https://github.com/seqeralabs/nf-tower-aws/tree/master/forge) permissions must be granted to the role instead of the user account. + When using AWS keys without an assumed role, the associated AWS user must have been granted permissions to operate on the cloud resources directly. When an assumed role is provided, the IAM user keys are only used to retrieve temporary credentials impersonating the role specified: this could be useful when e.g. multiple IAM users are used to access the same AWS account, and the actual permissions to operate on the resources are only granted to the role. ::: -1. Select a **Region**, e.g., _eu-west-1 - Europe (Ireland)_. -1. Enter your S3 bucket path in the **Pipeline work directory** field, e.g., `s3://seqera-bucket`. This bucket must be in the same region chosen in the previous step. +1. Select a **Region**, e.g., _eu-west-1 - Europe (Ireland)_. This region must match the location of the S3 bucket or EFS/FSx file system you plan to use as work directory. +1. In the **Pipeline work directory** field type or select from the dropdown menu the S3 bucket [previously created](#s3-bucket-creation), e.g., `s3://seqera-bucket`. The work directory can be customized to specify a folder inside the bucket where Nextflow intermediate files will be stored, e.g., `s3://seqera-bucket/nextflow-workdir`. The bucket must be located in the same region chosen in the previous step. + :::note When you specify an S3 bucket as your work directory, this bucket is used for the Nextflow [cloud cache](https://www.nextflow.io/docs/latest/cache-and-resume.html#cache-stores) by default. Seqera adds a `cloudcache` block to the Nextflow configuration file for all runs executed with this compute environment. This block includes the path to a `cloudcache` folder in your work directory, e.g., `s3://seqera-bucket/cloudcache/.cache`. You can specify an alternative cache location with the **Nextflow config file** field on the pipeline [launch](../launch/launchpad#launch-form) form. ::: + + Similarly you can specify a path in an EFS or FSx file system as your work directory. When using EFS or FSx, you'll need to scroll down to "EFS file system" or "FSx for Lustre" sections to specify either an existing file system ID or let Seqera create a new one for you automatically. Read the notes below on how to setup EFS or FSx. + :::warning Using an EFS file system as your work directory is currently incompatible with [Studios](../studios/overview), and will result in errors with checkpoints and mounted data. ::: + 1. Select **Enable Wave containers** to facilitate access to private container repositories and provision containers in your pipelines using the Wave containers service. See [Wave containers](https://www.nextflow.io/docs/latest/wave.html) for more information. 1. Select **Enable Fusion v2** to allow access to your S3-hosted data via the [Fusion v2](https://docs.seqera.io/fusion) virtual distributed file system. This speeds up most data operations. The Fusion v2 file system requires Wave containers to be enabled. See [Fusion file system](../supported_software/fusion/overview) for configuration details. @@ -626,8 +696,8 @@ Depending on the provided configuration, Forge might also create IAM roles for N 1. Select **Enable Fusion Snapshots (beta)** to enable Fusion to automatically restore jobs that are interrupted when an AWS Spot instance reclamation occurs. Requires Fusion v2. See [Fusion Snapshots](https://docs.seqera.io/fusion/guide/snapshots) for more information. -1. Set the **Config mode** to **Batch Forge**. -1. Select a **Provisioning model**. In most cases, this will be **Spot**. You can specify an allocation strategy and instance types under [**Advanced options**](#advanced-options). If advanced options are omitted, Seqera Platform 23.2 and later versions default to `BEST_FIT_PROGRESSIVE` for On-Demand and `SPOT_PRICE_CAPACITY_OPTIMIZED` for Spot compute environments. +1. Set the **Config mode** to **Batch Forge** to allow Seqera Platform to manage AWS Batch compute environments using the Forge tool. +1. Select a **Provisioning model**. To minimize compute costs select **Spot**. You can specify an allocation strategy and instance types under [**Advanced options**](#advanced-options). If advanced options are omitted, Seqera Platform 23.2 and later versions default to `BEST_FIT_PROGRESSIVE` for On-Demand and `SPOT_PRICE_CAPACITY_OPTIMIZED` for Spot compute environments. :::note You can create a compute environment that launches either Spot or On-Demand instances. Spot instances can cost as little as 20% of On-Demand instances, and with Nextflow's ability to automatically relaunch failed tasks, Spot is almost always the recommended provisioning model. Note, however, that when choosing Spot instances, Seqera will also create a dedicated queue for running the main Nextflow job using a single On-Demand instance to prevent any execution interruptions. @@ -655,19 +725,59 @@ Depending on the provided configuration, Forge might also create IAM roles for N Graviton requires Fargate, Wave containers, and Fusion v2 file system to be enabled. This feature is not compatible with GPU-based architecture. ::: 1. Enter any additional **Allowed S3 buckets** that your workflows require to read input data or write output data. The **Pipeline work directory** bucket above is added by default to the list of **Allowed S3 buckets**. -1. To use **EFS**, you can either select **Use existing EFS file system** and specify an existing EFS instance, or select **Create new EFS file system** to create one. To use the EFS file system as your work directory, specify `/work` in the **Pipeline work directory** field (step 8 of this guide). +1. To use an **EFS** file system in your pipeline, you can either select **Use existing EFS file system** and specify an existing EFS instance, or select **Create new EFS file system** to create one. + + To use the EFS file system as the work directory of the compute environment specify `/work` in the **Pipeline work directory** field (step 10 of this guide). - To use an existing EFS file system, enter the **EFS file system id** and **EFS mount path**. This is the path where the EFS volume is accessible to the compute environment. For simplicity, we recommend that you use `/mnt/efs` as the EFS mount path. - To create a new EFS file system, enter the **EFS mount path**. We advise that you specify `/mnt/efs` as the EFS mount path. - EFS file systems created by Batch Forge are automatically tagged in AWS with `Name=TowerForge-`, with `` being the compute environment ID. Any manually-added resource label with the key `Name` (capital N) will override the automatically-assigned `TowerForge-` label. + - A custom EC2 security group needs to be configured to allow the compute environment to access the EFS file system. + * Visit the [AWS Console for Security groups](https://console.aws.amazon.com/ec2/home?#SecurityGroups) and switch to the region where your workload will run. + * Select **Create security group**. + * Enter a relevant name like `seqera-efs-access-sg` and description, e.g., _EFS access for Seqera Batch compute environment_. + * Empty both **Inbound rules** and **Outbound rules** sections by deleting default rules. + * Optionally add **Tags** to the security group, then select **Create security group**. + * After creating the security group, select it from the security groups list, then select the **Inbound rules** tab and select **Edit inbound rules**. + * Select **Add rule** and configure the new rule as follows: + - **Type**: `NFS` + - **Source**: `Custom` and enter the security group ID that you're editing (you can search for it by name, e.g., `seqera-efs-access-sg`). This allows resources associated with the same security group to communicate with each other. + * Select **Save rules** to finalize the inbound rule configuration. + * Repeat the same steps to add an outbound rule to allow all outbound traffic: set type `All traffic` and destination `Anywhere-IPv4`/`Anywhere-IPv6`. + * See the [AWS documentation about EFS security groups](https://docs.aws.amazon.com/efs/latest/ug/network-access.html) for more information. + * The Security group then needs to be defined in the **Advanced options** below to allow the compute environment to access the EFS file system. :::warning EFS file systems are compatible with [Studios](../studios/overview), **except** when using the EFS file system as your **work directory**. ::: -1. To use **FSx for Lustre**, you can either select **Use existing FSx file system** and specify an existing FSx instance, or select **Create new FSx file system** to create one. To use the FSx file system as your work directory, specify `/work` in the **Pipeline work directory** field (step 8 of this guide). + +1. To use a **FSx for Lustre** file system in your pipeline, you can either select **Use existing FSx file system** and specify an existing FSx instance, or select **Create new FSx file system** to create one. + + To use the FSx file system as your work directory, specify `/work` in the **Pipeline work directory** field (step 10 of this guide). - To use an existing FSx file system, enter the **FSx DNS name** and **FSx mount path**. The FSx mount path is the path where the FSx volume is accessible to the compute environment. For simplicity, we recommend that you use `/mnt/fsx` as the FSx mount path. - To create a new FSx file system, enter the **FSx size** (in GB) and the **FSx mount path**. We advise that you specify `/mnt/fsx` as the FSx mount path. - FSx file systems created by Batch Forge are automatically tagged in AWS with `Name=TowerForge-`, with `` being the compute environment ID. Any manually-added resource label with the key `Name` (capital N) will override the automatically-assigned `TowerForge-` label. -1. Select **Dispose resources** to automatically delete these AWS resources if you delete the compute environment in Seqera Platform. -1. Apply [**Resource labels**](../resource-labels/overview) to the cloud resources consumed by this compute environment. Workspace default resource labels are prefilled. + - A custom EC2 security group needs to be configured to allow the compute environment to access the FSx file system. + * Visit the [AWS Console for Security groups](https://console.aws.amazon.com/ec2/home?#SecurityGroups) and switch to the region where your workload will run. + * Select **Create security group**. + * Enter a relevant name like `seqera-fsx-access-sg` and description, e.g., _FSx access for Seqera Batch compute environment_. + * Empty both **Inbound rules** and **Outbound rules** sections by deleting default rules. + * Optionally add **Tags** to the security group, then select **Create security group**. + * After creating the security group, select it from the security groups list, then select the **Inbound rules** tab and select **Edit inbound rules**. + * Select **Add rule** and configure the new rule as follows: + - **Type**: `Custom TCP` + - **Port range**: `988` + - **Source**: `Custom` and enter the security group ID that you're editing (you can search for it by name, e.g., `seqera-fsx-access-sg`). This allows resources associated with the same security group to communicate with each other. + * Repeat the step to add another rule with: + - **Type**: `Custom TCP` + - **Port range**: `1018-1023` + - **Source**: `Custom`, same as above. + * Select **Save rules** to finalize the inbound rule configuration. + * Repeat the same steps to add an outbound rule to allow all outbound traffic: set type `All traffic` and destination `Anywhere-IPv4`/`Anywhere-IPv6`. + * See the [AWS documentation about FSx security groups](https://docs.aws.amazon.com/fsx/latest/LustreGuide/limit-access-security-groups.html) for more information. + * The Security group then needs to be defined in the **Advanced options** below to allow the compute environment to access the FSx file system. + - You may need to install the `lustre` client in the AMI used by your compute environment to access FSx file systems. See [Installing the Lustre client](https://docs.aws.amazon.com/fsx/latest/LustreGuide/install-lustre-client.html) for more information. + +1. Select **Dispose resources** to automatically delete all AWS resources created by Seqera Platform when you delete the compute environment, including EFS/FSx file systems. +1. Apply [**Resource labels**](../resource-labels/overview) to the cloud resources produced by this compute environment. Workspace default resource labels are prefilled. 1. Expand **Staging options** to include: - Optional [pre- or post-run Bash scripts](../launch/advanced#pre-and-post-run-scripts) that execute before or after the Nextflow pipeline execution in your environment. - Global Nextflow configuration settings for all pipeline runs launched with this compute environment. Values defined here are pre-filled in the **Nextflow config file** field in the pipeline launch form. These values can be overridden during pipeline launch. @@ -686,18 +796,20 @@ See [Launch pipelines](../launch/launchpad) to start executing workflows in your Seqera Platform compute environments for AWS Batch include advanced options to configure instance types, resource allocation, custom networking, and CloudWatch and ECS agent integration. -**Batch Forge AWS Batch advanced options** +#### Seqera AWS Batch advanced options -Specify the **Allocation strategy** and indicate any preferred **Instance types**. AWS applies quotas for the number of running and requested [Spot](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-limits.html) and [On-Demand](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-on-demand-instances.html#ec2-on-demand-instances-limits) instances per account. AWS will allocate instances from up to 20 instance types, based on those requested for the compute environment. AWS excludes the largest instances when you request more than 20 instance types. +- Specify the **Allocation strategy** and indicate any preferred **Instance types**. AWS applies quotas for the number of running and requested [Spot](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-limits.html) and [On-Demand](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-on-demand-instances.html#ec2-on-demand-instances-limits) instances per account. AWS will allocate instances from up to 20 instance types, based on those requested for the compute environment. AWS excludes the largest instances when you request more than 20 instance types. :::note If these advanced options are omitted, allocation strategy defaults are `BEST_FIT_PROGRESSIVE` for On-Demand and `SPOT_PRICE_CAPACITY_OPTIMIZED` for Spot compute environments. ::: :::caution - tw CLI v0.8 and earlier does not support the `SPOT_PRICE_CAPACITY_OPTIMIZED` allocation strategy in AWS Batch. You cannot currently use CLI to create or otherwise interact with AWS Batch Spot compute environments that use this allocation strategy. + Platform CLI (known as `tw`) v0.8 and earlier do not support the `SPOT_PRICE_CAPACITY_OPTIMIZED` allocation strategy in AWS Batch. You cannot currently use CLI to create or otherwise interact with AWS Batch Spot compute environments that use this allocation strategy. ::: - Configure a custom networking setup using the **VPC ID**, **Subnets**, and **Security groups** fields. + * If not defined, the default VPC, subnets, and security groups for the selected region will be used. + * When using EFS or FSx file systems, select the security group previously created to allow access to the file system. The VPC ID the security group belongs to needs to match the VPC ID defined for the Seqera Batch compute environment. - You can specify a custom **AMI ID**. :::note @@ -734,52 +846,45 @@ Specify the **Allocation strategy** and indicate any preferred **Instance types* ## Manual configuration of Batch resources -This section is for users with a pre-configured AWS environment. You will need a [Batch queue, a Batch compute environment, an IAM user, and an S3 bucket](../enterprise/advanced-topics/manual-aws-batch-setup.mdx) already set up. - -To enable Seqera in your existing AWS configuration, you need an IAM user with the following permissions: - -- `AmazonS3ReadOnlyAccess` -- `AmazonEC2ContainerRegistryReadOnly` -- `CloudWatchLogsReadOnlyAccess` -- A [custom policy](https://github.com/seqeralabs/nf-tower-aws/blob/master/launch/launch-policy.json) to grant the ability to submit and control Batch jobs -- Write access to any S3 bucket used by pipelines with [this policy template](https://github.com/seqeralabs/nf-tower-aws/blob/master/forge/README.md#s3-access-optional) - -### S3 bucket access - -Seqera can use S3 to store the intermediate files and output data generated by pipeline executions. Create a policy for your Seqera IAM user that grants access to specific buckets. +This section is for users with a pre-configured AWS environment: follow the [AWS Batch queue and compute environment creation instructions](../enterprise/advanced-topics/manual-aws-batch-setup.mdx) to set up the required AWS Batch resources in your account. -**Assign an S3 access policy to Seqera IAM users** +A [S3 bucket](#s3-bucket-creation) or EFS/FSx file system is required to store Nextflow intermediate files when using Seqera with AWS Batch. -1. Go to the IAM User table in the [IAM service](https://console.aws.amazon.com/iam/home). -1. Select the IAM user. -1. Select **Add inline policy**. -1. Copy the contents of [this policy](https://github.com/seqeralabs/nf-tower-aws/blob/master/forge/README.md#s3-access-optional) into the **JSON** tab. Replace `YOUR-BUCKET-NAME` (lines 10 and 21) with your bucket name. -1. Name your policy and select **Create policy**. +Refer to the [IAM user creation](#iam-user-creation) section to ensure that your IAM user has the necessary permissions to run pipelines in Seqera Platform. Remove any permissions that are not required for your use case. ### Seqera manual compute environment -With your AWS environment and resources set up and your user permissions configured, create an AWS Batch compute environment in Seqera manually. +With your AWS environment and resources set up and your user permissions configured, create an AWS Batch compute environment in Seqera. :::caution -Your Seqera compute environment uses resources that you may be charged for in your AWS account. See [Cloud costs](../monitoring/cloud-costs) for guidelines to manage cloud resources effectively and prevent unexpected costs. +AWS Batch creates resources that you may be charged for in your AWS account. See [Cloud costs](../monitoring/cloud-costs) for guidelines to manage cloud resources effectively and prevent unexpected costs. ::: -**Create a manual Seqera compute environment** - -1. In a workspace, select **Compute environments > New environment**. -1. Enter a descriptive name for this environment, e.g., _AWS Batch Manual (eu-west-1)_. +1. After logging in to [Seqera](https://cloud.seqera.io) and selecting a workspace from the dropdown menu at the top of the page, select **Compute environments** from the navigation menu. +1. Select **Add compute environment**. +1. Enter a descriptive name for this environment, e.g., _AWS Batch Spot (eu-west-1)_. 1. Select **AWS Batch** as the target platform. -1. Select **+** to add new credentials. -1. Enter a name for the credentials, e.g., _AWS Credentials_. -1. Enter the **Access key** and **Secret key** for your IAM user. +1. From the **Credentials** drop-down, select existing AWS credentials, or select **+** to add new credentials. If you're using existing credentials, skip to step 9. :::note You can create multiple credentials in your Seqera environment. See [Credentials](../credentials/overview). ::: -1. Select a **Region**, e.g., _eu-west-1 - Europe (Ireland)_. -1. Enter an S3 bucket path for the **Pipeline work directory**, e.g., `s3://seqera-bucket`. This bucket must be in the same region chosen in the previous step. +1. Enter a name, e.g., _AWS Credentials_. +1. Add the **Access key** and **Secret key** you [previously obtained](#obtain-iam-user-credentials) when you created the Seqera IAM user. +1. (Optional) Under **Assume role**, specify the IAM role to be assumed by the Seqera IAM user to access the compute environment's AWS resources. + :::note + When using AWS keys without an assumed role, the associated AWS user must have been granted permissions to operate on the cloud resources directly. When an assumed role is provided, the IAM user keys are only used to retrieve temporary credentials impersonating the role specified: this could be useful when e.g. multiple IAM users are used to access the same AWS account, and the actual permissions to operate on the resources are only granted to the role. + ::: +1. Select a **Region**, e.g., _eu-west-1 - Europe (Ireland)_. This region must match the region where your S3 bucket or EFS/FSx work directory is located to avoid high data transfer costs. +1. Enter or select from the dropdown menu the S3 bucket [previously created](#s3-bucket-creation) in the **Pipeline work directory** field, e.g., `s3://seqera-bucket`. This bucket must be in the same region chosen in the previous step to avoid incurring high data transfer costs. The work directory can be customized to specify a folder inside the bucket, e.g., `s3://seqera-bucket/nextflow-workdir`. :::note When you specify an S3 bucket as your work directory, this bucket is used for the Nextflow [cloud cache](https://www.nextflow.io/docs/latest/cache-and-resume.html#cache-stores) by default. Seqera adds a `cloudcache` block to the Nextflow configuration file for all runs executed with this compute environment. This block includes the path to a `cloudcache` folder in your work directory, e.g., `s3://seqera-bucket/cloudcache/.cache`. You can specify an alternative cache location with the **Nextflow config file** field on the pipeline [launch](../launch/launchpad#launch-form) form. ::: + + Similarly you can provide an EFS or FSx file system as your work directory. + :::warning + Using an EFS file system as your work directory is currently incompatible with [Studios](../studios/overview), and will result in errors with checkpoints and mounted data. + ::: + 1. Select **Enable Wave containers** to facilitate access to private container repositories and provision containers in your pipelines using the Wave containers service. See [Wave containers](https://www.nextflow.io/docs/latest/wave.html) for more information. 1. Select **Enable Fusion v2** to allow access to your S3-hosted data via the [Fusion v2](https://docs.seqera.io/fusion) virtual distributed file system. This speeds up most data operations. The Fusion v2 file system requires Wave containers to be enabled. See [Fusion file system](../supported_software/fusion/overview) for configuration details. @@ -795,6 +900,7 @@ Your Seqera compute environment uses resources that you may be charged for in yo 1. Use Seqera Platform version 23.1 or later. 1. Use an S3 bucket as the pipeline work directory. 1. Enable **Wave containers**, **Fusion v2**, and **fast instance storage**. + 1. Select the **Batch Forge** config mode. 1. Fast instance storage requires an EC2 instance type that uses NVMe disks. Specify NVMe-based instance types in **Instance types** under **Advanced options**. If left unspecified, Platform selects instances from AWS NVMe-based instance type families. See [Instance store temporary block storage for EC2 instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html) for more information. :::note @@ -818,10 +924,11 @@ Your Seqera compute environment uses resources that you may be charged for in yo 1. Select **Enable Fusion Snapshots (beta)** to enable Fusion to automatically restore jobs that are interrupted when an AWS Spot instance reclamation occurs. Requires Fusion v2. See [Fusion Snapshots](https://docs.seqera.io/fusion/guide/snapshots) for more information. + 1. Set the **Config mode** to **Manual**. -1. Enter the **Head queue**, which is the name of the AWS Batch queue that the Nextflow main job will run. +1. Enter the **Head queue** created following the [instructions](../enterprise/advanced-topics/manual-aws-batch-setup.mdx), which is the name of the AWS Batch queue that the Nextflow main job will run. 1. Enter the **Compute queue**, which is the name of the AWS Batch queue where tasks will be submitted. -1. Apply [**Resource labels**](../resource-labels/overview) to the cloud resources consumed by this compute environment. Workspace default resource labels are prefilled. +1. Apply [**Resource labels**](../resource-labels/overview) to the cloud resources produced by this compute environment. Workspace default resource labels are prefilled. 1. Expand **Staging options** to include: - Optional [pre- or post-run Bash scripts](../launch/advanced#pre-and-post-run-scripts) that execute before or after the Nextflow pipeline execution in your environment. - Global Nextflow configuration settings for all pipeline runs launched with this compute environment. Values defined here are pre-filled in the **Nextflow config file** field in the pipeline launch form. These values can be overridden during pipeline launch. @@ -840,8 +947,6 @@ See [Launch pipelines](../launch/launchpad) to start executing workflows in your Seqera compute environments for AWS Batch include advanced options to configure resource allocation, execution roles, custom AWS CLI tool paths, and CloudWatch integration. -**Seqera AWS Batch advanced options** - - Use **Head Job CPUs** and **Head Job Memory** to specify the hardware resources allocated for the Nextflow head job. The default head job memory allocation is 4096 MiB. - Use **Head Job role** and **Compute Job role** to grant fine-grained IAM permissions to the Head Job and Compute Jobs, - Add an execution role ARN to the **Batch execution role** field to grant permissions to make API calls on your behalf to the ECS container used by Batch. This is required if the pipeline launched with this compute environment needs access to the secrets stored in this workspace. This field can be ignored if you are not using secrets. diff --git a/platform-cloud/docs/compute-envs/eks.md b/platform-cloud/docs/compute-envs/eks.md index 299c6e4ea..6745aa64f 100644 --- a/platform-cloud/docs/compute-envs/eks.md +++ b/platform-cloud/docs/compute-envs/eks.md @@ -69,7 +69,7 @@ After you have prepared your Kubernetes cluster and assigned a service account r 1. Add the IAM user **Access key** and **Secret key**. This is the IAM user with the service account role detailed in the previous section. 1. (Optional) Under **Assume role**, specify the IAM role to be assumed by the Seqera IAM user to access the compute environment AWS resources. :::note - When using AWS keys without an assumed role, the associated AWS user account must have Seqera [Launch](https://github.com/seqeralabs/nf-tower-aws/tree/master/launch) and [Forge](https://github.com/seqeralabs/nf-tower-aws/tree/master/forge) permissions. When an assumed role is provided, the keys are only used to retrieve temporary credentials impersonating the role specified. In this case, Seqera [Launch](https://github.com/seqeralabs/nf-tower-aws/tree/master/launch) and [Forge](https://github.com/seqeralabs/nf-tower-aws/tree/master/forge) permissions must be granted to the role instead of the user account. + When using AWS keys without an assumed role, the associated AWS user account must have all the appropriate [IAM permissions](./aws-batch.md#required-platform-iam-permissions). When an assumed role is provided, the keys are only used to retrieve temporary credentials impersonating the role specified: in this case, the permissions must be granted to the role instead of the user account, and the user must have the `sts:AssumeRole` permission for the role (see [AWS Batch IAM role creation (optional)](./aws-batch.md#iam-role-creation-optional)). ::: 1. Select a **Region**, e.g., _eu-west-1 - Europe (Ireland)_. 1. Select a **Cluster name** from the list of available EKS clusters in the selected region. @@ -85,9 +85,9 @@ After you have prepared your Kubernetes cluster and assigned a service account r 1. Apply [**Resource labels**](../resource-labels/overview) to the cloud resources consumed by this compute environment. Workspace default resource labels are prefilled. 1. Expand **Staging options** to include: - Optional [pre- or post-run Bash scripts](../launch/advanced#pre-and-post-run-scripts) that execute before or after the Nextflow pipeline execution in your environment. - - Global Nextflow configuration settings for all pipeline runs launched with this compute environment. Values defined here are pre-filled in the **Nextflow config file** field in the pipeline launch form. These values can be overridden during pipeline launch. + - Global Nextflow configuration settings for all pipeline runs launched with this compute environment. Values defined here are pre-filled in the **Nextflow config file** field in the pipeline launch form. These values can be overridden during pipeline launch. :::info - Configuration settings in this field override the same values in the pipeline repository `nextflow.config` file. See [Nextflow config file](../launch/advanced#nextflow-config-file) for more information on configuration priority. + Configuration settings in this field override the same values in the pipeline repository `nextflow.config` file. See [Nextflow config file](../launch/advanced#nextflow-config-file) for more information on configuration priority. ::: 1. Specify custom **Environment variables** for the **Head job** and/or **Compute jobs**. 1. Configure any advanced options described in the next section, as needed. @@ -191,4 +191,3 @@ To use [Fusion v2](https://docs.seqera.io/fusion) in your Seqera EKS compute env See the [AWS documentation](https://docs.aws.amazon.com/eks/latest/userguide/associate-service-account-role.html) for further details. - diff --git a/platform-cloud/docs/data/data-explorer.md b/platform-cloud/docs/data/data-explorer.md index 0f39968c5..4b3aeee43 100644 --- a/platform-cloud/docs/data/data-explorer.md +++ b/platform-cloud/docs/data/data-explorer.md @@ -22,7 +22,7 @@ Data Explorer lists public and private data repositories. Repositories accessibl - **Retrieve data repositories with workspace credentials** - Private data repositories accessible to the credentials defined in your workspace are listed in Data Explorer automatically. The permissions required for your [AWS](../compute-envs/aws-batch#iam), [Google Cloud](../compute-envs/google-cloud-batch#iam), [Azure Batch](../compute-envs/azure-batch#storage-account), or Amazon S3-compatible API storage: credentials allow full Data Explorer functionality. + Private data repositories accessible to the credentials defined in your workspace are listed in Data Explorer automatically. The permissions required for your [AWS](../compute-envs/aws-batch#required-platform-iam-permissions), [Google Cloud](../compute-envs/google-cloud-batch#iam), [Azure Batch](../compute-envs/azure-batch#storage-account), or Amazon S3-compatible API storage: credentials allow full Data Explorer functionality. - **Configure individual data repositories manually** diff --git a/platform-cloud/docs/enterprise/advanced-topics/manual-aws-batch-setup.mdx b/platform-cloud/docs/enterprise/advanced-topics/manual-aws-batch-setup.mdx index 255d8b809..981de904c 100644 --- a/platform-cloud/docs/enterprise/advanced-topics/manual-aws-batch-setup.mdx +++ b/platform-cloud/docs/enterprise/advanced-topics/manual-aws-batch-setup.mdx @@ -11,14 +11,11 @@ import TabItem from '@theme/TabItem'; This page describes how to set up AWS roles and Batch queues manually for the deployment of Nextflow workloads with Seqera Platform. :::tip -Manual AWS Batch configuration is only necessary if you don't use Batch Forge. - -Batch Forge _automatically creates_ the AWS Batch queues required for your workflow executions. +Manual AWS Batch configuration is only necessary if you don't want to let Seqera Platform create the required AWS Batch resources in your AWS account automatically, done using the internal tool called Batch Forge. ::: -Complete the following procedures to configure AWS Batch manually: +Complete the following steps to configure the AWS Batch resources needed by Seqera Platform: -1. Create a user policy. 2. Create the instance role policy. 3. Create the AWS Batch service role. 4. Create an EC2 Instance role. @@ -27,39 +24,6 @@ Complete the following procedures to configure AWS Batch manually: 7. Create the AWS Batch compute environments. 8. Create the AWS Batch queue. -### Create a user policy - -Create the policy for the user launching Nextflow jobs: - -1. In the [IAM Console](https://console.aws.amazon.com/iam/home), select **Create policy** from the Policies page. -1. Create a new policy with the following content: - - ```json - { - "Version": "2012-10-17", - "Statement": [ - { - "Sid": "Stmt1530313170000", - "Effect": "Allow", - "Action": [ - "batch:CancelJob", - "batch:RegisterJobDefinition", - "batch:DescribeComputeEnvironments", - "batch:DescribeJobDefinitions", - "batch:DescribeJobQueues", - "batch:DescribeJobs", - "batch:ListJobs", - "batch:SubmitJob", - "batch:TerminateJob" - ], - "Resource": ["*"] - } - ] - } - ``` - -1. Save with it the name `seqera-user`. - ### Create the instance role policy Create the policy with a role that allows Seqera to submit Batch jobs on your EC2 instances: @@ -174,7 +138,7 @@ Create a launch template to configure the EC2 instances deployed by Batch jobs: -a fetch-config \ -m ec2 \ -s \ - -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json + -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json mkdir -p /scratch/fusion NVME_DISKS=($(nvme list | grep 'Amazon EC2 NVMe Instance Storage' | awk '{ print $1 }')) NUM_DISKS=${#NVME_DISKS[@]} @@ -276,8 +240,8 @@ Create a launch template to configure the EC2 instances deployed by Batch jobs: ### Create the Batch compute environments -:::caution -AWS Graviton instances (ARM64 CPU architecture) are not supported in manual compute environments. To use Graviton instances, create your AWS Batch compute environment with [Batch Forge](../../compute-envs/aws-batch#batch-forge-compute-environment). +:::caution +AWS Graviton instances (ARM64 CPU architecture) are not supported in manual compute environments. To use Graviton instances, create your AWS Batch compute environment with [Batch Forge](../../compute-envs/aws-batch#create-a-seqera-aws-batch-compute-environment). ::: Nextflow makes use of two job queues during workflow execution: @@ -301,7 +265,7 @@ The head queue requires an on-demand compute environment. Do not select **Use Sp 1. In the [Batch Console](https://eu-west-1.console.aws.amazon.com/batch/home), select **Create** on the Compute environments page. 1. Select **Amazon EC2** as the compute environment configuration. :::note - Seqera AWS Batch compute environments created with [Batch Forge](../../compute-envs/aws-batch#batch-forge-compute-environment) support using Fargate for the head job, but manual compute environments must use EC2. + Seqera AWS Batch compute environments created with [Batch Forge](../../compute-envs/aws-batch#create-a-seqera-aws-batch-compute-environment) support using Fargate for the head job, but manual compute environments must use EC2. ::: 1. Enter a name of your choice, and apply the `seqera-servicerole` and `seqera-instancerole`. 1. Enter vCPU limits and instance types, if needed. @@ -310,7 +274,7 @@ The head queue requires an on-demand compute environment. Do not select **Use Sp ::: 1. Expand **Additional configuration** and select the `seqera-launchtemplate` from the Launch template dropdown. 1. Configure VPCs, subnets, and security groups on the next page as needed. -1. Review your configuration and select **Create compute environment**. +1. Review your configuration and select **Create compute environment**. @@ -320,11 +284,11 @@ Create this compute environment to use Spot instances for your workflow compute 1. In the [Batch Console](https://eu-west-1.console.aws.amazon.com/batch/home), select **Create** on the Compute environments page. 1. Select **Amazon EC2** as the compute environment configuration. 1. Enter a name of your choice, and apply the `seqera-servicerole` and `seqera-instancerole`. -1. Select **Enable using Spot instances** to use Spot instances and save computing costs. +1. Select **Enable using Spot instances** to use Spot instances and save computing costs. 1. Select the `seqera-fleetrole` and enter vCPU limits and instance types, if needed. 1. Expand **Additional configuration** and select the `seqera-launchtemplate` from the Launch template dropdown. 1. Configure VPCs, subnets, and security groups on the next page as needed. -1. Review your configuration and select **Create compute environment**. +1. Review your configuration and select **Create compute environment**. diff --git a/platform-cloud/docs/getting-started/proteinfold.md b/platform-cloud/docs/getting-started/proteinfold.md index dcaf43702..3e411ab64 100644 --- a/platform-cloud/docs/getting-started/proteinfold.md +++ b/platform-cloud/docs/getting-started/proteinfold.md @@ -19,7 +19,7 @@ You will need the following to get started: - [Admin](../orgs-and-teams/roles) permissions in an existing organization workspace. See [Set up your workspace](./workspace-setup) to create an organization and workspace from scratch. - An existing AWS cloud account with access to the AWS Batch service. -- Existing access credentials with permissions to create and manage resources in your AWS account. See [IAM](../compute-envs/aws-batch#iam) for guidance to set up IAM permissions for Platform. +- Existing access credentials with permissions to create and manage resources in your AWS account. See [IAM](../compute-envs/aws-batch#required-platform-iam-permissions) for guidance to set up IAM permissions for Platform. ::: ## Compute environment diff --git a/platform-cloud/docs/getting-started/rnaseq.md b/platform-cloud/docs/getting-started/rnaseq.md index ee30b0db5..523908a1d 100644 --- a/platform-cloud/docs/getting-started/rnaseq.md +++ b/platform-cloud/docs/getting-started/rnaseq.md @@ -21,7 +21,7 @@ You will need the following to get started: - [Admin](../orgs-and-teams/roles) permissions in an existing organization workspace. See [Set up your workspace](./workspace-setup) to create an organization and workspace from scratch. - An existing AWS cloud account with access to the AWS Batch service. -- Existing access credentials with permissions to create and manage resources in your AWS account. See [IAM](../compute-envs/aws-batch#iam) for guidance to set up IAM permissions for Platform. +- Existing access credentials with permissions to create and manage resources in your AWS account. See [IAM](../compute-envs/aws-batch#required-platform-iam-permissions) for guidance to set up IAM permissions for Platform. ::: ## Compute environment diff --git a/platform-cloud/docs/getting-started/studios.md b/platform-cloud/docs/getting-started/studios.md index bec92d1ca..66b741bc2 100644 --- a/platform-cloud/docs/getting-started/studios.md +++ b/platform-cloud/docs/getting-started/studios.md @@ -13,7 +13,7 @@ This guide explores how Studios integrates with your existing workflows, bridgin You will need the following to get started: - At least the **Maintain** workspace [user role](../orgs-and-teams/roles) to create and configure Studios. -- An [AWS Batch compute environment](../compute-envs/aws-batch#batch-forge-compute-environment) (**without Fargate**) with sufficient resources (minimum: 2 CPUs, 8192 MB RAM). +- An [AWS Batch compute environment](../compute-envs/aws-batch#create-a-seqera-aws-batch-compute-environment) (**without Fargate**) with sufficient resources (minimum: 2 CPUs, 8192 MB RAM). - Valid [credentials](../credentials/overview) for your cloud storage account and compute environment. - [Data Explorer](../data/data-explorer) enabled in your workspace. ::: @@ -32,7 +32,7 @@ This script and instructions can also be used to visualize the structures from * #### Create an AWS Batch compute environment -Studios require an AWS Batch compute environment. If you do not have an existing compute environment available, [create one](../compute-envs/aws-batch#batch-forge-compute-environment) with the following attributes: +Studios require an AWS Batch compute environment. If you do not have an existing compute environment available, [create one](../compute-envs/aws-batch#create-a-seqera-aws-batch-compute-environment) with the following attributes: - **Region**: To minimize costs, your compute environment should be in the same region as your data. To browse the nf-core AWS megatests public data optimally, select **eu-west-1**. - **Provisioning model**: Use **On-demand** EC2 instances. @@ -366,7 +366,7 @@ An R-IDE enables interactive analysis using R libraries and tools. For example, #### Create an AWS Batch compute environment -Studios require an AWS Batch compute environment. If you do not have an existing compute environment available, [create one](../compute-envs/aws-batch#batch-forge-compute-environment) with the following attributes: +Studios require an AWS Batch compute environment. If you do not have an existing compute environment available, [create one](../compute-envs/aws-batch#create-a-seqera-aws-batch-compute-environment) with the following attributes: - **Region**: To minimize costs, your compute environment should be in the same region as your data. To browse the nf-core AWS megatests public data optimally, select **eu-west-1**. - **Provisioning model**: Use **On-demand** EC2 instances. @@ -463,7 +463,7 @@ Xpra provides remote desktop functionality that enables many interactive analysi #### Create an AWS Batch compute environment -Studios require an AWS Batch compute environment. If you do not have an existing compute environment available, [create one](../compute-envs/aws-batch#batch-forge-compute-environment) with the following attributes: +Studios require an AWS Batch compute environment. If you do not have an existing compute environment available, [create one](../compute-envs/aws-batch#create-a-seqera-aws-batch-compute-environment) with the following attributes: - **Region**: To minimize costs, your compute environment should be in the same region as your data. To browse the 1000 Genomes public data optimally, select **us-east-1**. - **Provisioning model**: Use **On-demand** EC2 instances. @@ -531,7 +531,7 @@ Using Studios and Visual Studio Code allows you to create a portable and interac #### Create an AWS Batch compute environment -Studios require an AWS Batch compute environment. If you do not have an existing compute environment available, [create one](../compute-envs/aws-batch#batch-forge-compute-environment) with the following attributes: +Studios require an AWS Batch compute environment. If you do not have an existing compute environment available, [create one](../compute-envs/aws-batch#create-a-seqera-aws-batch-compute-environment) with the following attributes: - **Region**: To minimize costs, your compute environment should be in the same region as your data. To use the iGenomes public data bucket that contains the *nf-core/fetchngs* `test` profile data, select **eu-west-1**. - **Provisioning model**: Use **On-demand** EC2 instances. diff --git a/platform-cloud/docs/resource-labels/overview.md b/platform-cloud/docs/resource-labels/overview.md index 32f3559f0..e619848a1 100644 --- a/platform-cloud/docs/resource-labels/overview.md +++ b/platform-cloud/docs/resource-labels/overview.md @@ -37,7 +37,7 @@ If a compute environment is created with Batch Forge, it propagates resource lab ### Resource labels applied to a pipeline run -A run inherits resource labels applied at the compute environment, pipeline, and action level. Resource labels can also be added or overridden during pipeline launch. +A run inherits resource labels applied at the compute environment, pipeline, and action level. Resource labels can also be added or overridden during pipeline launch. When a run is executed with resource labels attached: @@ -46,7 +46,7 @@ When a run is executed with resource labels attached: ### Resource labels applied to a Studio -A Studio inherits resource labels applied at the compute environment level. Resource labels can also be added or overridden when you add a Studio. +A Studio inherits resource labels applied at the compute environment level. Resource labels can also be added or overridden when you add a Studio. When a Studio starts with resource labels attached: @@ -66,7 +66,7 @@ When a Studio starts with resource labels attached: 1. Select **Add label**. 1. Under **Type**, select **Resource label**. 1. Enter a **Name** such as `owner`, `team`, or `platform-run`. -1. Enter a **Value**: +1. Enter a **Value**: - **Standard resource labels**: ``, `TEAM_NAME` - **[Dynamic resource labels](#dynamic-resource-labels)**: Use variable syntax — `${workflowId}` or `${sessionId}` 1. Optionally, enable **Use as default in compute environment form** to automatically apply this label to all new compute environments in this workspace. @@ -85,10 +85,10 @@ The deletion of a resource label from a workspace has no influence on the cloud Once created at the workspace level, resource labels can be applied to: - **Compute environments**: In the **Resource labels** field when creating a new compute environment. Once the compute environment has been created, its resource labels cannot be edited. -- **Pipelines**: In the **Resource labels** field when adding or editing a pipeline. +- **Pipelines**: In the **Resource labels** field when adding or editing a pipeline. - **Actions**: In the **Resource labels** field when creating or editing an action. -- **Pipeline runs**: In the **Resource labels** field when launching a pipeline. -- **Studios**: In the **Resource labels** field when adding a Studio. +- **Pipeline runs**: In the **Resource labels** field when launching a pipeline. +- **Studios**: In the **Resource labels** field when adding a Studio. Resource labels from the compute environment or pipeline are prefilled in the pipeline launch form, and compute environment resource labels are prefilled in the Studio add form. You can apply or override these labels when you launch a pipeline or add a Studio. Workspace maintainers can override default resource labels inherited from the compute environment when they create or edit pipelines, actions, runs, and Studios. Custom resource labels associated with each element propagate to resources in your cloud provider account. They don't alter the default resource labels on the compute environment. @@ -98,7 +98,7 @@ For example, the resource label `name=ce1` is set during AWS Batch compute envir If a maintainer changes the compute environment associated with a pipeline, the **Resource labels** field is updated with the resource labels from the new compute environment. -## Dynamic resource labels +## Dynamic resource labels Dynamic resource labels extend the standard resource labels functionality by allowing variable values that are populated with unique workflow identifiers at runtime. This enables precise cost tracking and resource attribution for individual pipeline runs across cloud compute environments. @@ -164,11 +164,11 @@ The following resources are tagged using the labels associated with the compute At execution time, when jobs are submitted to Batch, the requests are set up to propagate tags to all the instances and volumes created by the head job. -The [`forge-policy.json` file](https://github.com/seqeralabs/nf-tower-aws/blob/master/forge/forge-policy.json) contains the roles needed for Batch Forge-created AWS Batch compute environments to tag AWS resources. Specifically, the required roles are `iam:TagRole`, `iam:TagInstanceProfile`, and `batch:TagResource`. +The [IAM permissions](../compute-envs/aws-batch.md#required-platform-iam-permissions) contain the roles needed for Batch Forge-created AWS Batch compute environments to tag AWS resources. Specifically, the required roles are `iam:TagRole`, `iam:TagInstanceProfile`, and `batch:TagResource`. To view and manage the resource labels applied to AWS resources by Seqera and Nextflow, go to the [AWS Tag Editor](https://docs.aws.amazon.com/tag-editor/latest/userguide/find-resources-to-tag.html) (as an administrative user) and follow these steps: -1. Under **Find resources to tag**, search for the resource label key and value in the relevant search fields under **Tags**. Your search can be further refined by AWS region and resource type. +1. Under **Find resources to tag**, search for the resource label key and value in the relevant search fields under **Tags**. Your search can be further refined by AWS region and resource type. 1. Select **Search resources**. **Resource search results** display all the resources tagged with your given resource label key and/or value. ### Include Seqera resource labels in AWS billing reports @@ -189,12 +189,12 @@ To include the cost information associated with your resource labels in your AWS - Choose **Activate** - Allow up to 24 hours for tags to activate -3. **For static resource labels - View in Cost Explorer or Data Exports**: +3. **For static resource labels - View in Cost Explorer or Data Exports**: - Navigate to AWS Cost Explorer and use **Group by** filters to organize costs by your activated tag keys - Create [cost allocation reports](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/configurecostallocreport.html#allocation-viewing) including your resource label tags - Alternatively, view in Data Exports and QuickSight dashboards for more detailed analysis -4. **For dynamic resource labels - Enable split cost allocation and view in Data Exports**: +4. **For dynamic resource labels - Enable split cost allocation and view in Data Exports**: - [Enable split cost allocation data](https://docs.aws.amazon.com/cur/latest/userguide/enabling-split-cost-allocation-data.html) in your Cost and Usage Reports preferences - View costs in your [Data Exports](https://docs.aws.amazon.com/cur/latest/userguide/what-is-data-exports.html) and Cost and Usage Reports (CUR) - Query reports using Amazon Athena or visualize in Amazon QuickSight dashboards (requires a QuickSight subscription) @@ -246,7 +246,7 @@ See [here](https://cloud.google.com/resource-manager/docs/creating-managing-labe ### Azure -The system used for labeling resources in Azure differs depending on your compute environment type: +The system used for labeling resources in Azure differs depending on your compute environment type: - In an **Azure Batch** compute environment created with Batch Forge, resource labels are added to the Pool parameters — this adds set of `key=value` **metadata** pairs to the Azure Batch Pool. - In an **Azure Cloud** (single instance) compute environment, resource labels are propagated to VMs and related resources as **tags**. @@ -324,4 +324,4 @@ See [Syntax and character set](https://kubernetes.io/docs/concepts/overview/work ## Troubleshooting -See [Resource labels](../troubleshooting_and_faqs/resource-labels.md) for troubleshooting common resource label propagation errors. +See [Resource labels](../troubleshooting_and_faqs/resource-labels.md) for troubleshooting common resource label propagation errors. diff --git a/platform-cloud/docs/secrets/overview.md b/platform-cloud/docs/secrets/overview.md index 7ba61f4ee..adf90cb66 100644 --- a/platform-cloud/docs/secrets/overview.md +++ b/platform-cloud/docs/secrets/overview.md @@ -35,39 +35,13 @@ When you launch a new workflow, all secrets are sent to the corresponding secret Secrets are automatically deleted from the secret manager when the pipeline completes, successfully or unsuccessfully. -:::note -In AWS Batch compute environments, Seqera passes stored secrets to jobs as part of the Seqera-created job definition. Seqera secrets cannot be used in Nextflow processes that use a [custom job definition](https://www.nextflow.io/docs/latest/aws.html#custom-job-definition). +:::note +In AWS Batch compute environments, Seqera passes stored secrets to jobs as part of the Seqera-created job definition. Seqera secrets cannot be used in Nextflow processes that use a [custom job definition](https://www.nextflow.io/docs/latest/aws.html#custom-job-definition). ::: ## AWS Secrets Manager integration -Seqera and associated AWS Batch IAM Roles require additional permissions to interact with AWS Secrets Manager. - -### Seqera instance permissions - -Augment the existing instance [permissions](https://github.com/seqeralabs/nf-tower-aws) with this policy: - -**IAM Permissions** - -Augment the permissions given to Seqera with the following Sid: - -```json - { - "Version": "2012-10-17", - "Statement": [ - { - "Sid": "AllowTowerEnterpriseSecrets", - "Effect": "Allow", - "Action": [ - "secretsmanager:DeleteSecret", - "secretsmanager:ListSecrets", - "secretsmanager:CreateSecret" - ], - "Resource": "*" - } - ] - } -``` +Seqera and associated AWS Batch IAM Roles require [specific permissions](../compute-envs/aws-batch#pipeline-secrets-optional) to interact with AWS Secrets Manager. :::note If you plan to limit the scope of this IAM policy, please ensure that the ListSecrets action remains granted on all resources (`"Resource": "*"`).