diff --git a/manifests/modules/fundamentals/storage/fsxl/.workshop/cleanup.sh b/manifests/modules/fundamentals/storage/fsxl/.workshop/cleanup.sh new file mode 100755 index 0000000000..9d558e0d8b --- /dev/null +++ b/manifests/modules/fundamentals/storage/fsxl/.workshop/cleanup.sh @@ -0,0 +1,29 @@ +#!/bin/bash + +set -e + +logmessage "Deleting assets-images folder..." + +# Delete local directory of image files +rm -rf ~/environment/assets-images/ + +check=$(helm list -n kube-system | grep aws-fsx-csi-driver || true) + +logmessage "Scaling down assets deployment..." + +kubectl scale -n assets --replicas=0 deployment/assets + +if [ ! -z "$check" ]; then + logmessage "Deleting FSX Lustre CSI driver addon..." + + helm uninstall aws-fsx-csi-driver -n kube-system +fi + +logmessage "Deleting PV and PVC that were created..." + +# Delete PVC +kubectl delete pvc fsx-claim -n assets --ignore-not-found=true + +# Delete PV +kubectl delete pv fsx-pv --ignore-not-found=true + diff --git a/manifests/modules/fundamentals/storage/fsxl/.workshop/terraform/main.tf b/manifests/modules/fundamentals/storage/fsxl/.workshop/terraform/main.tf new file mode 100644 index 0000000000..c86a6212b1 --- /dev/null +++ b/manifests/modules/fundamentals/storage/fsxl/.workshop/terraform/main.tf @@ -0,0 +1,156 @@ +# Attach AmazonFSxFullAccess managed policy +resource "aws_iam_role_policy_attachment" "fsx_full_access" { + role = "eks-workshop-ide-role" + policy_arn = "arn:aws:iam::aws:policy/AmazonFSxFullAccess" +} + +# Add after the policy attachment +resource "time_sleep" "wait_for_policy_propagation" { + depends_on = [aws_iam_role_policy_attachment.fsx_full_access] + create_duration = "5s" # reduce to minimum amount possible +} + +# Add Service_Linked_Role inline policy +resource "aws_iam_role_policy" "service_linked_role" { + name = "Service_Linked_Role" + role = "eks-workshop-ide-role" + + policy = < + Mounts: + /tmp from tmp-volume (rw) + Volumes: + tmp-volume: + Type: EmptyDir (a temporary directory that shares a pod's lifetime) + Medium: Memory + SizeLimit: +[...] +``` + +Looking at the [`Volumes`](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir-configuration-example) section, we can see that the Deployment currently uses an [EmptyDir volume type](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) that exists only for the Pod's lifetime. + +![Assets with emptyDir](./assets/assets-emptydir.webp) + +An `emptyDir` volume is created when a Pod is assigned to a node and persists only while that Pod runs on that node. As its name suggests, the volume starts empty. While all containers within the Pod can read and write files in the emptyDir volume (even when mounted at different paths), **when a Pod is removed from a node for any reason, the data in the emptyDir is deleted permanently.** This makes EmptyDir unsuitable for sharing data between multiple Pods in the same Deployment when that data needs to persist. + +The container comes with some initial product images, which are copied during the build process to `/usr/share/nginx/html/assets`. We can verify this by running: + +```bash +$ kubectl exec --stdin deployment/assets \ + -n assets -- bash -c "ls /usr/share/nginx/html/assets/" +chrono_classic.jpg +gentleman.jpg +pocket_watch.jpg +smart_1.jpg +smart_2.jpg +wood_watch.jpg +``` + +To demonstrate the limitations of EmptyDir storage, let's scale up the `assets` Deployment to multiple replicas: + +```bash +$ kubectl scale -n assets --replicas=2 deployment/assets +deployment.apps/assets scaled + +$ kubectl rollout status -n assets deployment/assets --timeout=60s +deployment "assets" successfully rolled out +``` + +Now, let's add a new product image called `divewatch.png` to the `/usr/share/nginx/html/assets` directory of the first Pod and verify it exists: + +```bash +$ POD_NAME=$(kubectl -n assets get pods -o jsonpath='{.items[0].metadata.name}') +$ kubectl exec --stdin $POD_NAME \ + -n assets -- bash -c 'touch /usr/share/nginx/html/assets/divewatch.jpg' +$ kubectl exec --stdin $POD_NAME \ + -n assets -- bash -c 'ls /usr/share/nginx/html/assets' +chrono_classic.jpg +divewatch.jpg <----------- +gentleman.jpg +pocket_watch.jpg +smart_1.jpg +smart_2.jpg +wood_watch.jpg +``` + +Let's check if the new product image `divewatch.jpg` appears in the second Pod: + +```bash +$ POD_NAME=$(kubectl -n assets get pods -o jsonpath='{.items[1].metadata.name}') +$ kubectl exec --stdin $POD_NAME \ + -n assets -- bash -c 'ls /usr/share/nginx/html/assets' +chrono_classic.jpg +gentleman.jpg +pocket_watch.jpg +smart_1.jpg +smart_2.jpg +wood_watch.jpg +``` + +As we can see, `divewatch.jpg` doesn't exist in the second Pod. This demonstrates why we need a shared filesystem that persists across multiple Pods when scaling horizontally, allowing file updates without requiring redeployment. diff --git a/website/docs/fundamentals/storage/fsx-for-lustre/20-introduction-to-fsxl.md b/website/docs/fundamentals/storage/fsx-for-lustre/20-introduction-to-fsxl.md new file mode 100644 index 0000000000..b4e03c6b48 --- /dev/null +++ b/website/docs/fundamentals/storage/fsx-for-lustre/20-introduction-to-fsxl.md @@ -0,0 +1,119 @@ +--- +title: FSx for Lustre Setup +sidebar_position: 20 +--- + +Before proceeding with this section, it's important to understand the Kubernetes storage concepts (volumes, persistent volumes (PV), persistent volume claims (PVC), dynamic provisioning, and ephemeral storage) that were covered in the [Storage](../index.md) main section. + +The [Amazon FSx for Lustre Container Storage Interface (CSI) driver](https://github.com/kubernetes-sigs/aws-fsx-csi-driver) enables Kubernetes applications to access files in an FSx for Lustre fils system. The driver implements the [CSI](https://github.com/container-storage-interface/spec/blob/master/spec.md) specification, allowing container orchestrators (CO) to manage storage volumes effectively. + +The following architecture diagram illustrates how we will use FSx for Lustre linked with an Amazon S3 bucket as persistent storage for our EKS pods: + +![Assets with FSx for Lustre](./assets/assets-fsxl.webp) + + +Let's begin by creating a [Data Respository Association (DRA)](https://docs.aws.amazon.com/fsx/latest/LustreGuide/create-dra-linked-data-repo.html) between the FSx for Lustre file system and an S3 bucket. This will create the association and notify you when it is complete, which takes approximately eight minutes: + +```bash +$ ASSOCIATION_ID=$(aws fsx create-data-repository-association \ + --file-system-id $FSX_ID \ + --file-system-path "/" \ + --data-repository-path "s3://$BUCKET_NAME" \ + --s3 "AutoImportPolicy={Events=[NEW,CHANGED,DELETED]},AutoExportPolicy={Events=[NEW,CHANGED,DELETED]}" \ + --query 'Association.AssociationId' \ + --output text) + +echo "Creating Data Repository Association..." + +while true; do + STATUS=$(aws fsx describe-data-repository-associations --association-ids $ASSOCIATION_ID --query 'Associations[0].Lifecycle' --output text) + + if [ "$STATUS" = "AVAILABLE" ]; then + echo "Data Repository Association is now AVAILABLE." + break + elif [ "$STATUS" = "FAILED" ]; then + echo "Data Repository Association creation FAILED." + break + fi + sleep 5 +done +Creating Data Repository Association... +Data Repository Association is now AVAILABLE. +$ +``` + +Now that the S3 data repository association has been created, let's create a staging directory with the images needed in our watch store scenario: + +```bash +$ mkdir ~/environment/assets-images/ +$ cd ~/environment/assets-images/ +$ curl --remote-name-all https://raw.githubusercontent.com/aws-containers/retail-store-sample-app/main/src/assets/public/assets/{chrono_classic.jpg,gentleman.jpg,pocket_watch.jpg,smart_2.jpg,wood_watch.jpg} + % Total % Received % Xferd Average Speed Time Time Time Current + Dload Upload Total Spent Left Speed +100 14 100 14 0 0 61 0 --:--:-- --:--:-- --:--:-- 62 + % Total % Received % Xferd Average Speed Time Time Time Current + Dload Upload Total Spent Left Speed +100 14 100 14 0 0 100 0 --:--:-- --:--:-- --:--:-- 100 + % Total % Received % Xferd Average Speed Time Time Time Current + Dload Upload Total Spent Left Speed +100 14 100 14 0 0 133 0 --:--:-- --:--:-- --:--:-- 133 + % Total % Received % Xferd Average Speed Time Time Time Current + Dload Upload Total Spent Left Speed +100 14 100 14 0 0 103 0 --:--:-- --:--:-- --:--:-- 103 + % Total % Received % Xferd Average Speed Time Time Time Current + Dload Upload Total Spent Left Speed +100 14 100 14 0 0 108 0 --:--:-- --:--:-- --:--:-- 109 +$ +``` + +Next, we'll copy these image assets to our S3 bucket using the `aws s3 cp` command: + +```bash +$ cd ~/environment/ +$ aws s3 cp ~/environment/assets-images/ s3://$BUCKET_NAME/ --recursive +upload: assets-images/wood_watch.jpg to s3://eks-workshop-s3-data20250619213912331100000003/wood_watch.jpg +upload: assets-images/smart_2.jpg to s3://eks-workshop-s3-data20250619213912331100000003/smart_2.jpg +upload: assets-images/pocket_watch.jpg to s3://eks-workshop-s3-data20250619213912331100000003/pocket_watch.jpg +upload: assets-images/chrono_classic.jpg to s3://eks-workshop-s3-data20250619213912331100000003/chrono_classic.jpg +upload: assets-images/gentleman.jpg to s3://eks-workshop-s3-data20250619213912331100000003/gentleman.jpg +$ +``` + +We can verify the uploaded objects in our bucket using the `aws s3 ls` command: + +```bash +$ aws s3 ls $BUCKET_NAME +2024-10-14 19:29:05 98157 chrono_classic.jpg +2024-10-14 19:29:05 58439 gentleman.jpg +2024-10-14 19:29:05 58655 pocket_watch.jpg +2024-10-14 19:29:05 20795 smart_2.jpg +2024-10-14 19:29:05 43122 wood_watch.jpg +$ +``` + +With our initial objects now in the Amazon S3 bucket, we cam now configure the FSx for Lustre CSI driver and add it to our EKS cluster. This operation will take a few minutes to complete: + +```bash +$ helm repo add aws-fsx-csi-driver https://kubernetes-sigs.github.io/aws-fsx-csi-driver/ +"aws-fsx-csi-driver" has been added to your repositories +$ helm upgrade --install aws-fsx-csi-driver \ + --namespace kube-system \ + aws-fsx-csi-driver/aws-fsx-csi-driver +Release "aws-fsx-csi-driver" does not exist. Installing it now. +NAME: aws-fsx-csi-driver +LAST DEPLOYED: Thu Jun 19 22:10:58 2025 +NAMESPACE: kube-system +STATUS: deployed +REVISION: 1 +TEST SUITE: None +$ +``` + +Once completed, we can verify what the addon created in our EKS cluster: + +```bash +$ kubectl get daemonset fsx-csi-node -n kube-system +NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE +fsx-csi-node 3 3 0 3 0 kubernetes.io/os=linux 5s +$ +``` diff --git a/website/docs/fundamentals/storage/fsx-for-lustre/30-fsxl-and-s3-dra.md b/website/docs/fundamentals/storage/fsx-for-lustre/30-fsxl-and-s3-dra.md new file mode 100644 index 0000000000..ac0c077302 --- /dev/null +++ b/website/docs/fundamentals/storage/fsx-for-lustre/30-fsxl-and-s3-dra.md @@ -0,0 +1,171 @@ +--- +title: FSx for Lustre with S3 DRA +sidebar_position: 30 +--- + +In our previous steps, we prepared our environment by creating a staging directory for image objects, downloading image assets, and uploading them to our S3 bucket that is used as the DRA for our FSx for Lustre file system. We also installed and configured the FSx for Lustre CSI driver. Now we'll complete our objective of creating an image host application with **horizontal scaling** and **persistent storage** backed by Amazon FSx for Lustre by attaching our pods to use the Persistent Volume (PV) provided by the Amazon FSx for Lustre CSI driver. + +Let's start by creating a [Persistent Volume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) and modifying the `assets` container in our deployment to mount this volume. + +First, let's examine the `fsxpvclaim.yaml` file to understand its parameters and configuration: + +::yaml{file="manifests/modules/fundamentals/storage/fsxl/deployment/fsxpvclaim.yaml"} + +```kustomization +modules/fundamentals/storage/fsxl/deployment/deployment.yaml +Deployment/assets +``` + +Now let's apply this configuration and redeploy our application: + +```bash +$ kubectl kustomize ~/environment/eks-workshop/modules/fundamentals/storage/fsxl/deployment \ + | envsubst | kubectl apply -f- +namespace/assets unchanged +serviceaccount/assets unchanged +configmap/assets unchanged +service/assets unchanged +persistentvolume/fsx-pv created +persistentvolumeclaim/fsx-claim created +deployment.apps/assets configured +``` + +We'll monitor the deployment progress: + +```bash +$ kubectl rollout status --timeout=120s deployment/assets -n assets +Waiting for deployment "assets" rollout to finish: 1 old replicas are pending termination... +Waiting for deployment "assets" rollout to finish: 1 old replicas are pending termination... +deployment "assets" successfully rolled out +``` + +Let's verify our volume mounts, noting the new `/fsx-lustre` mount point: + +```bash +$ kubectl get deployment -n assets \ + -o yaml | yq '.items[].spec.template.spec.containers[].volumeMounts' +- mountPath: /fsx-lustre + name: fsx-lustre +- mountPath: /tmp + name: tmp-volume +``` + +Examine our newly created PersistentVolume: + +```bash +$ kubectl get pv +AME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE +fsx-pv 1200Gi RWX Retain Bound assets/fsx-claim 56s +``` + +Review the PersistentVolumeClaim details: + +```bash +$ kubectl describe pvc -n assets +Name: fsx-claim +Namespace: assets +StorageClass: +Status: Bound +Volume: fsx-pv +Labels: +Annotations: pv.kubernetes.io/bind-completed: yes + pv.kubernetes.io/bound-by-controller: yes +Finalizers: [kubernetes.io/pvc-protection] +Capacity: 1200Gi +Access Modes: RWX +VolumeMode: Filesystem +Used By: assets-654d866dc8-hrcml + assets-654d866dc8-w8thw +Events: +``` + +Verify our running pods: + +```bash +$ kubectl get pods -n assets +NAME READY STATUS RESTARTS AGE +assets-9fbbbcd6f-c74vv 1/1 Running 0 2m36s +assets-9fbbbcd6f-vb9jz 1/1 Running 0 2m38s +``` + +Let's examine our final deployment configuration with the Mountpoint for Amazon S3 CSI driver: + +```bash +$ kubectl describe deployment -n assets +Name: assets +Namespace: assets +[...] + Containers: + assets: + Image: public.ecr.aws/aws-containers/retail-store-sample-assets:0.4.0 + Port: 8080/TCP + Host Port: 0/TCP + Limits: + memory: 128Mi + Requests: + cpu: 128m + memory: 128Mi + Liveness: http-get http://:8080/health.html delay=0s timeout=1s period=3s #success=1 #failure=3 + Environment Variables from: + assets ConfigMap Optional: false + Environment: + Mounts: + /fsx-lustre from fsx-lustre (rw) + /tmp from tmp-volume (rw) + Volumes: + fsx-lustre: + Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) + ClaimName: fsx-claim + ReadOnly: false + tmp-volume: + Type: EmptyDir (a temporary directory that shares a pod's lifetime) + Medium: Memory + SizeLimit: +[...] +``` + +Now let's demonstrate the shared storage functionality. First, we'll list the files in the first pod: + +```bash +$ POD_1=$(kubectl -n assets get pods -o jsonpath='{.items[0].metadata.name}') +$ kubectl exec --stdin $POD_1 -n assets -- bash -c 'ls /fsx-lustre/' +chrono_classic.jpg +gentleman.jpg +pocket_watch.jpg +smart_2.jpg +wood_watch.jpg +``` + +Now let's create a new file called `divewatch.png` and upload it into our DRA S3 bucket that backs the FSx for Lustre PVC: + +```bash +$ touch divewatch.png && aws s3 cp divewatch.png s3://$BUCKET_NAME/ +upload: ./divewatch.png to s3://eks-workshop-s3-data20250619213912331100000003/divewatch.png +``` + +We can now verify that the new file `divewatch.png` is visible from our first pod: + +```bash +$ kubectl exec --stdin $POD_1 -n assets -- bash -c 'ls /fsx-lustre/' +chrono_classic.jpg +divewatch.png <----------- +gentleman.jpg +pocket_watch.jpg +smart_2.jpg +wood_watch.jpg +``` + +To verify the persistence and sharing of our storage layer, let's check the second pod for the file we just created: + +```bash +$ POD_2=$(kubectl -n assets get pods -o jsonpath='{.items[1].metadata.name}') +$ kubectl exec --stdin $POD_2 -n assets -- bash -c 'ls /fsx-lustre/' +chrono_classic.jpg +divewatch.png <----------- +gentleman.jpg +pocket_watch.jpg +smart_2.jpg +wood_watch.jpg +``` + +With that we've successfully demonstrated how we can use Mountpoint for Amazon S3 for persistent shared storage for workloads running on EKS. \ No newline at end of file diff --git a/website/docs/fundamentals/storage/fsx-for-lustre/assets/assets-emptydir.webp b/website/docs/fundamentals/storage/fsx-for-lustre/assets/assets-emptydir.webp new file mode 100644 index 0000000000..e88c42fa50 Binary files /dev/null and b/website/docs/fundamentals/storage/fsx-for-lustre/assets/assets-emptydir.webp differ diff --git a/website/docs/fundamentals/storage/fsx-for-lustre/assets/assets-fsxl.webp b/website/docs/fundamentals/storage/fsx-for-lustre/assets/assets-fsxl.webp new file mode 100644 index 0000000000..4333eee0b3 Binary files /dev/null and b/website/docs/fundamentals/storage/fsx-for-lustre/assets/assets-fsxl.webp differ diff --git a/website/docs/fundamentals/storage/fsx-for-lustre/assets/assets-s3.webp b/website/docs/fundamentals/storage/fsx-for-lustre/assets/assets-s3.webp new file mode 100644 index 0000000000..0e3ef3884d Binary files /dev/null and b/website/docs/fundamentals/storage/fsx-for-lustre/assets/assets-s3.webp differ diff --git a/website/docs/fundamentals/storage/fsx-for-lustre/index.md b/website/docs/fundamentals/storage/fsx-for-lustre/index.md new file mode 100644 index 0000000000..58b7511d92 --- /dev/null +++ b/website/docs/fundamentals/storage/fsx-for-lustre/index.md @@ -0,0 +1,36 @@ +--- +title: Amazon FSx for Lustre +sidebar_position: 35 +sidebar_custom_props: { "module": true } +description: "Amazon FSx for Lustre is a fully managed service that provides high-performance, cost-effective, and scalable storage powered by Lustre, the world’s most popular high-performance file system" +--- + +::required-time + +:::tip Before you start +Prepare your environment for this section: + +```bash timeout=1800 wait=30 +$ prepare-environment fundamentals/storage/fsxl +``` + +This will make the following changes to your lab environment: + +- Create an IAM role for the Amazon FSx for Lustre CSI driver +- Create an Amazon Simple Storage Service (S3) bucket for use in the workshop + +You can view the Terraform that applies these changes [here](https://github.com/VAR::MANIFESTS_OWNER/VAR::MANIFESTS_REPOSITORY/tree/VAR::MANIFESTS_REF/manifests/modules/fundamentals/storage/fsxl/.workshop/terraform). + +::: + +[Amazon FSx for Lustre](https://aws.amazon.com/fsx/lustre/) is a fully managed service that provides high-performance, cost-effective, and scalable storage powered by Lustre, the world’s most popular high-performance file system. FSx for Lustre provides the fastest storage performance for GPU instances in the cloud with up to terabytes per second of throughput, millions of IOPS, sub-millisecond latencies, and virtually unlimited storage capacity. It delivers up to 34% better price performance compared to on-premises HDD file storage and up to 70% better price performance compared to other cloud-based Lustre storage. + +The [Amazon FSx for Lustre Container Storage Interface (CSI) driver](https://github.com/kubernetes-sigs/aws-fsx-csi-driver) provides a CSI interface that allows Amazon EKS clusters to manage the lifecycle of Amazon FSx for Lustre file systems. + +In this lab, we will create an Amazon FSx for Lustre file system to provide persistent, shared storage for our EKS cluster. The FSx for Lustre file system uses an [S3](https://aws.amazon.com/s3/) bucket as the data repository, and a [Data Repository Association (DRA)](https://docs.aws.amazon.com/fsx/latest/LustreGuide/create-dra-linked-data-repo.html) will be created between the the Lustre file system and the S3 bucket. + +We will cover the following topics: + +- Ephemeral Container Storage +- Introduction to FSx for Lustre +- FSx for Lustre with S3 DRA \ No newline at end of file