Skip to content
This repository was archived by the owner on Sep 16, 2025. It is now read-only.
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
136 changes: 136 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
version: 2.1

orbs:
aws-sagemaker: circleci/[email protected]

# To understand context of these, please see their usage down below and the readme.
parameters:
bucket:
type: string
default: circleci-sagemaker-pipeline
model_desc:
type: string
default: "Kitten Classifier allowing us to distinguish between giraffes and kittens."
model_name:
type: string
default: kitten-classifier
project_id:
type: string
default: "e47ee9b0-446f-44cf-bec8-5407ceb06930"
region_name:
type: string
default: us-east-1

main-branch-only: &main-branch-only
branches:
only:
- main

# This workflow demonstrates moving your SageMaker model from dev to production
workflows:
deploy-model-through-to-prod:
jobs:
- aws-sagemaker/create_model:
# job name that will show in Workflow DAG
name: create-model
# s3 bucket where asset will be stored
bucket: << pipeline.parameters.bucket >>
# Name of the model in SageMaker that we will be deploying.
model_name: << pipeline.parameters.model_name >>
# We use the pipeline.id as the unique identifier for some of the configs we create
circle_pipeline_id: << pipeline.id >>
# Region where we are deploying to
region_name: << pipeline.parameters.region_name >>
filters: *main-branch-only

- aws-sagemaker/create_endpoint_configuration: # q: should this be create_endpoint_configuration?
name: dev:create-model-endpoint-config
bucket: << pipeline.parameters.bucket >>
# Name of env you are working with. This is just some arbirtrary string thats works for how you like to organize.
deploy_environment: dev
model_name: << pipeline.parameters.model_name >>
circle_pipeline_id: << pipeline.id >>
circle_project_id: << pipeline.parameters.project_id >>
region_name: << pipeline.parameters.region_name >>
requires:
- create-model
filters: *main-branch-only

- aws-sagemaker/deploy_endpoint:
name: dev:deploy-model-to-endpoint
bucket: << pipeline.parameters.bucket >>
deploy_environment: dev
model_name: << pipeline.parameters.model_name >>
# Description for the model. q: can we make it optional?
model_desc: << pipeline.parameters.model_desc >>
circle_pipeline_id: << pipeline.id >>
# You can find this value in the Project Settings in CircleCI
circle_project_id: << pipeline.parameters.project_id >>
region_name: << pipeline.parameters.region_name >>
requires:
- dev:create-model-endpoint-config
filters: *main-branch-only

- promote-model-to-prod-endpoint:
type: approval
requires:
- dev:deploy-model-to-endpoint
filters: *main-branch-only

- aws-sagemaker/create_endpoint_configuration:
name: prod:create-model-endpoint-config
bucket: << pipeline.parameters.bucket >>
deploy_environment: prod
model_name: << pipeline.parameters.model_name >>
circle_pipeline_id: << pipeline.id >>
circle_project_id: << pipeline.parameters.project_id >>
region_name: << pipeline.parameters.region_name >>
requires:
- promote-model-to-prod-endpoint
filters: *main-branch-only

- aws-sagemaker/deploy_endpoint:
name: prod:deploy-model-to-endpoint
bucket: << pipeline.parameters.bucket >>
deploy_environment: prod
model_name: << pipeline.parameters.model_name >>
model_desc: << pipeline.parameters.model_desc >>
circle_pipeline_id: << pipeline.id >>
circle_project_id: << pipeline.parameters.project_id >>
region_name: << pipeline.parameters.region_name >>
requires:
- prod:create-model-endpoint-config
filters: *main-branch-only

# For model-train work. if you won't need to train a model to use the demo, go ahead and delete it
model-train:
jobs:
- kitten-model-train:
filters:
branches:
only:
- model-train

# For model-train work. if you won't need to train a model to use the demo, go ahead and delete it
jobs:
kitten-model-train:
docker:
- image: python:3.11
environment:
BUCKET_NAME: << pipeline.parameters.bucket >>
REGION_NAME: << pipeline.parameters.region_name >>
MODEL_NAME: << pipeline.parameters.model_name >>
MODEL_DESC: << pipeline.parameters.model_desc >>
steps:
- checkout
- run:
name: install python dependencies
command: pip install -r ./kitten_model/requirements.txt --upgrade
- run:
name: gather data
command: python ./kitten_model/gather_data.py
- run:
name: train and register model
command: python ./kitten_model/train_register.py


190 changes: 190 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
![Model deployment orchestration](https://images.ctfassets.net/il1yandlcjgk/5xnNL9sZ3jQr9F6GtvjywP/66e7b94c4b960b3d3ba73ddf2b94943b/Sagemaker-circleci-repo-banner.png)

# Using AWS SageMaker Orb To Orchestrate Model Deployment Across Environments

## Pre-reqs

### Assumptions

* You have a model package in the SageMaker Studio Model Registry. We provide an easy way to train one - please see the `kitten_model` folder README AFTER you finish reviewing this document.
* You know how to setup an IAM OIDC provider and setup a trust relationship for a role.

### OIDC - Identity Provider

The Amazon SageMaker Orb uses OIDC. You need to setup an IAM > Identity Provider in your AWS IAM for CircleCI OIDC Provider.

Skip this section if you already have this setup.

First get your CircleCI Organization ID. Go to your Organization Settings in CCI and copy your Organization ID.

![Organization Settings page in CircleCI](https://images.ctfassets.net/il1yandlcjgk/1VVWYWy9vyFRStkwnXOo4m/b17e167fa649c9151fc494cc9be3223e/OIDC-CCI-GET-ORG-ID.png)

Now go to your AWS Management Console. Go to IAM > Access management > Identity providers. Select Add Provider.

![Identity providers management panel](https://images.ctfassets.net/il1yandlcjgk/3vtHDfDCVb0J1mdNsIIh6y/d4e2f44f39ebff2cd0d0077428bec276/OIDC-IDENTITY-PROVIDERS.png)

Enter your Provider URL. Then click Thumbprint

**Provider URL**: Enter `https://oidc.circleci.com/org/<your-organization-id>`, where `your-organization-id` is the ID of your CircleCI organization.

**Audience**: Enter your organization ID

![Add an identity provider screen](https://images.ctfassets.net/il1yandlcjgk/670HDmxgHiLf9US5PVA4bU/28a5088493d1e400688ead79606215d4/OIDC-ADD-IDENTITY-PROVIDER.png)

Click `Get Thumbprint` then `Add Provider`


Please see the guide on [Using OIDC tokens in jobs](https://circleci.com/docs/openid-connect-tokens/#aws) for deeper details.

### Role

You will need an IAM > Role with the following Permissions policy.

**Note**: We have organized the permissions into two groups. OrbPermissions and S3Access statements are used for the deployment of the model to the endpoints. The S3AccessTrainModel and SageMakerTrainModel statements are needed if you want to train the demo model we provide.

Update the S3 bucket information to match your setup.

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "OrbPermissions",
"Effect": "Allow",
"Action": [
"sagemaker:AddTags",
"sagemaker:CreateEndpointConfig",
"sagemaker:CreateModel",
"sagemaker:DescribeEndpoint",
"sagemaker:DescribeEndpointConfig",
"sagemaker:ListEndpoints",
"sagemaker:ListModelPackages",
"sagemaker:ListTags",
"sagemaker:UpdateEndpoint",
"iam:PassRole"
],
"Resource": "*"
},
{
"Sid": "S3Access",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::circleci-sagemaker-pipeline/*"
]
},
{
"Sid": "S3AccessTrainModel",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::sagemaker-sample-files/*",
"arn:aws:s3:::circleci-sagemaker-pipeline",
"arn:aws:s3:::circleci-sagemaker-pipeline/*"
]
},
{
"Sid": "SageMakerTrainModel",
"Effect": "Allow",
"Action": [
"sagemaker:CreateTrainingJob",
"sagemaker:DescribeTrainingJob",
"logs:DescribeLogStreams",
"sagemaker:ListModelPackageGroups",
"sagemaker:CreateModelPackage",
"sagemaker:UpdateModelPackage"
],
"Resource": "*"
}
]
}
```

Then setup the Trust relationship between the Role and the CircleCI OIDC Provider. Here is an example Policy. **Note**: you must replace the placeholders `<CIRCLECI-ORG-ID>` and `<CIRCLECI-PROJECT-ID>` with your proper info.

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<AWS-ACCOUNT-ID>:oidc-provider/oidc.circleci.com/org/<CIRCLECI-ORG-ID>"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringLike": {
"oidc.circleci.com/org/<CIRCLECI-ORG-ID>:sub": "org/<CIRCLECI-ORG-ID>/project/<CIRCLECI-PROJECT-ID>/user/*"
}
}
},
{
"Effect": "Allow",
"Principal": {
"Service": "sagemaker.amazonaws.com"
},
"Action": "sts:AssumeRole"
}

]
}
```

### Environment Variables

There are some Environment Variables for the orb to function. Please configure these at either the Project level or using Org Contexts. [Guide on setting Environment Variables in CircleCI](https://circleci.com/docs/set-environment-variable/).

`SAGEMAKER_EXECUTION_ROLE_ARN` (required): This is the role you have configured with the necessary SageMaker permissions, and has the OIDC Trust relationship setup.

`CCI_RELEASE_INTEGRATION_TOKEN` (optional): The [CircleCI Releases](https://app.circleci.com/releases) page offers you a single pane of glass to monitor all your deployments across environments. You can view deployment progress in real time, see what versions are currently deployed, and navigate easily to the SageMaker console. To make a Release Integration Token please see the section [Setting up a Release Integration Token](#setting-up-a-release-integration-token).

## Orb Parameters

`bucket` - This is the S3 bucket where resources will be stored.

`deploy_environment` - The name of the environment you are working with. This is an arbitrary string that works for how you like to organize your model deploys. Can be 'dev' or 'prod', for example.

`model_desc` - A description for the model to be deployed.

`model_name` - The name of the model in SageMaker that we will be deploying.

`circle_pipeline_id` - The pipeline.id is ued as a unique identifier for some of the configurations we create. Format: << pipeline.id >>

`circle_project_id` - Found in the Project Settings in CircleCI. Used for specifying the project that triggered this deployment.

`region_name` - The aws region where the deployment is to happen. eg: `us-east-1`

For full range of options, consult the circleci/aws-sagemaker orb [documentation](https://circleci.com/developer/orbs/orb/circleci/aws-sagemaker#jobs).

## Setting up a Release Integration Token

First you'll want to set up a Release Integration token, so you can leverage the CircleCI UI to monitor your releases. (Please note that you must be an org admin to do this). Navigate to the **Releases** section. Select **Add Release Environment**.

![Blank Releases](https://images.ctfassets.net/il1yandlcjgk/4zP2grQuNff9Zgoj35VnPN/dc6254fe184bf817ca53b4d60433e74e/blank-releases.png)

Select `Amazon SageMaker`, add a Name and Create Environment.

![Create new environment](https://images.ctfassets.net/il1yandlcjgk/36jJ5EjIMpEJjq2EaRZlJd/2df4856a7c810f4f9ac8e4e0a0068462/modal-create-new-environment.png)

Select your Environment:

![Release environment](https://images.ctfassets.net/il1yandlcjgk/1DwzNcayuWUfRLTnbq9u2L/2fe393977cc4f0bff8a23c5fddce14dd/release-env.png)

And click on **Create New Token**.

![Create new token](https://images.ctfassets.net/il1yandlcjgk/4QF3GoDCjIOVgAiUnOe8A5/b88084517074af9f6116723704fe8891/release-create-key.png)

Make sure to save this token for later - we'll pop it into an environment variable.


## Support

Stuck? Need help? Visit our [forums](https://discuss.circleci.com/), contact us directly at [[email protected]](mailto:[email protected]), or come visit on [Discord](https://discord.com/invite/UWsWB44zYj).

15 changes: 15 additions & 0 deletions kitten_model/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Get a quick model going

This document assumes you have aleready done the OIDC/IAM setup in the main README.

To use this to generate a model package to work with, please go to SageMaker and setup a Studio.

Then you can push this branch up and run the associated workflow.

When you push the branch `model-train` the workflow in this demo will create a model in the Amazon SageMaker Model Package Registry.

Any subsequent runs of this workflow will create a new version of the model. You can just push up a dummy change, or trigger the branch workflow in CCI.

Thanks to Timothy Cheung who's post I cribbed this from: [https://circleci.com/blog/machine-learning-ci-cd-with-aws-sagemaker/](https://circleci.com/blog/machine-learning-ci-cd-with-aws-sagemaker/)

This assumes you have setup the Environment Variable `SAGEMAKER_EXECUTION_ROLE_ARN` as descibed in the README.md in the root of this project.
50 changes: 50 additions & 0 deletions kitten_model/gather_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import boto3
import os
import sagemaker

bucket = os.environ["BUCKET_NAME"]
model_name = os.environ["MODEL_NAME"]
region_name = os.environ["REGION_NAME"]
role_arn = os.environ["SAGEMAKER_EXECUTION_ROLE_ARN"]
web_identity_token = os.environ["CIRCLE_OIDC_TOKEN"]


# Set up the session and client we will need for this step
role = boto3.client("sts").assume_role_with_web_identity(
RoleArn=role_arn, RoleSessionName="assume-role", WebIdentityToken=web_identity_token
)
credentials = role["Credentials"]
aws_access_key_id = credentials["AccessKeyId"]
aws_secret_access_key = credentials["SecretAccessKey"]
aws_session_token = credentials["SessionToken"]
boto_session = boto3.Session(
region_name=region_name,
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
aws_session_token=aws_session_token,
)
sagemaker_client = boto_session.client(service_name="sagemaker")
sagemaker_runtime_client = boto_session.client(service_name="sagemaker-runtime")
sagemaker_session = sagemaker.Session(
boto_session=boto_session,
sagemaker_client=sagemaker_client,
sagemaker_runtime_client=sagemaker_runtime_client,
default_bucket=bucket,
)
s3_client = boto_session.client(service_name="s3")


# Data retrieval and processing taken from
# https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/xgboost_abalone/xgboost_abalone.ipynb
# You would likely replace this part for your own use case, such as querying from Snowflake or Redshift

# S3 bucket where the training data is located
data_bucket = f"sagemaker-sample-files"
data_prefix = "datasets/tabular/uci_abalone"

for data_category in ["train", "validation"]:
data_key = "{0}/{1}/abalone.{1}".format(data_prefix, data_category)
output_key = "{0}/{1}/{1}.libsvm".format(model_name, data_category)
data_filename = "abalone.{}".format(data_category)
s3_client.download_file(data_bucket, data_key, data_filename)
s3_client.upload_file(data_filename, bucket, output_key)
Loading