Skip to content

added new tool to scale up-down nodes on an instance group #708

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions 1.architectures/0.common/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ This template creates a S3 Bucket with all public access disabled. To deploy it,
This template deploys a stack to receive human-readable email notifications for HyperPod cluster status changes and node health events. See the [workshop page](https://catalog.workshops.aws/sagemaker-hyperpod/en-US/07-tips-and-tricks/26-event-bridge) for more details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kindly remove this file + directory.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KeitaW these files were already there and are not part of this PR.

If you want, I can open a new PR for moving the files that were originally there too, as those have not been modified. The reason is that other assets, such as workshops, might have links to those files and moving them will break these assets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah okay, my bad.


[<kbd> <br> 1-Click Deploy 🚀 <br> </kbd>](https://console.aws.amazon.com/cloudformation/home?#/stacks/quickcreate?templateURL=https://ws-assets-prod-iad-r-iad-ed304a55c2ca1aee.s3.us-east-1.amazonaws.com/e3752eec-63b5-4033-9720-fa68d35164e9/hyperpod-event-bridge-email.yaml&stackName=hyperpod-event-bridge-email)

6 changes: 6 additions & 0 deletions 1.architectures/5.sagemaker-hyperpod/tools/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,9 @@ Utility to dump details of all nodes in a cluster, into a csv file.
**Usage:** `python dump_cluster_nodes_info.py –cluster-name <name-of-cluster-whose-node-details-are-needed>`

**Output:** “nodes.csv” file in the current directory, containing details of all nodes in the cluster

## Create a scheduler to scale up and down the number of nodes in an instance group

This template deploys an AWS Lambda lamdba function which is triggered by an Amazon EventBridge Rule to scale up and down the number of nodes based on a cron expression.

[<kbd> <br> 1-Click Deploy 🚀 <br> </kdb>](https://ws-assets-prod-iad-r-iad-ed304a55c2ca1aee.s3.us-east-1.amazonaws.com/2433d39e-ccfe-4c00-9d3d-9917b729258e/update-instance-group-instance-count.yaml)
Loading