Skip to content

Commit 2efd23b

Browse files
authored
Enh create registry (#6)
* Create model package groups in registry in CodeBuild, and remove from Notebook. Update API to ensure deploye stage matches. * Update to pass sagemaker tags for testing register * Updates to include additional permission required to deploy CDK resources * Updates to add experiments to project, and include tuning jobs * Update to add boto retry * Minor tweaks to README
1 parent a8beecd commit 2efd23b

File tree

11 files changed

+368
-217
lines changed

11 files changed

+368
-217
lines changed

README.md

Lines changed: 74 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -110,18 +110,53 @@ cdk bootstrap
110110

111111
To bootstrap and deploy, you will require permissions create AWS CloudFormation Stacks and the associated resources for your current execution role.
112112

113-
If you have cloned this notebook into SageMaker Studio, you can find your user's role by browsing to the Studio dashboard.
113+
If you have cloned this notebook into SageMaker Studio, you will need to add additional permissions to the SageMaker Studio execution role. You can find your user's role by browsing to the Studio dashboard.
114114

115115
![\[AB Testing Pipeline Execution Role\]](docs/ab-testing-pipeline-execution-role.png)
116116

117117
Browse to the [IAM](https://console.aws.amazon.com/iam) section in the console, and find this role. Then attach the following managed policies.
118118

119-
* `AWSCloudFormationFullAccess`
120119
* `AmazonAPIGatewayAdministrator`
120+
* `AmazonDynamoDBFullAccess`
121+
* `AmazonKinesisFirehoseFullAccess`
122+
* `CloudWatchEventsFullAccess`
123+
* `AWSCloudFormationFullAccess`
121124
* `AWSLambda_FullAccess`
122-
* `AmazonKinesisFullAccess`
123125
* `AWSServiceCatalogAdminFullAccess`
124126

127+
Then, click the **Add inline policy** link, switch to to the **JSON** tab, and paste the following inline policy:
128+
129+
```
130+
{
131+
"Version": "2012-10-17",
132+
"Statement": [
133+
{
134+
"Effect": "Allow",
135+
"Action": [
136+
"iam:AttachRolePolicy",
137+
"iam:CreateRole",
138+
"iam:GetRole",
139+
"iam:PutRolePolicy",
140+
"iam:PassRole",
141+
"iam:DetachRolePolicy",
142+
"iam:DeleteRolePolicy",
143+
"iam:DeleteRole"
144+
],
145+
"Resource": "arn:aws:iam::*:role/ab-testing-api-*"
146+
},
147+
{
148+
"Effect": "Allow",
149+
"Action": [
150+
"logs:PutRetentionPolicy"
151+
],
152+
"Resource": "arn:aws:logs:**:*:log-group:ab-testing-api-*"
153+
}
154+
]
155+
}
156+
```
157+
158+
Click **Review policy** and provide the name `CDK-CreateRolePolicy` then click **Create policy**
159+
125160
![\[AB Testing Pipeline Execution Role\]](docs/ab-testing-pipeline-iam-role.png)
126161

127162
You should now be able to list the stacks by running:
@@ -146,22 +181,22 @@ Follow are a list of context values that are provided in the `cdk.json`, which c
146181
|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------|
147182
| `api_name` | The API Gateway Name | "ab-testing" |
148183
| `stage_name` | The stage namespace for resource and API Gateway path | "dev" |
149-
| `endpoint_prefix` | A prefix to filter which Amazon SageMaker endpoints the API can invoked. | "" |
184+
| `endpoint_prefix` | A prefix to filter Amazon SageMaker endpoints the API can invoke. | "" |
150185
| `api_lambda_memory` | The [lambda memory](https://docs.aws.amazon.com/lambda/latest/dg/configuration-memory.html) allocation for API endpoint. | 768 |
151186
| `api_lambda_timeout` | The lambda timeout for the API endpoint. | 10 |
152187
| `metrics_lambda_memory` | The [lambda memory](https://docs.aws.amazon.com/lambda/latest/dg/configuration-memory.html) allocated for metrics processing Lambda | 768 |
153188
| `metrics_lambda_timeout` | The lambda timeout for the processing lambda. | 10 |
154189
| `dynamodb_read_capacity` | The [Read Capacity](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html) for the DynamoDB tables | 5 |
155190
| `dynamodb_write_capacity` | The [Write Capacity](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html) for the DynamoDB tables | 5 |
156-
| `delivery_sync` | When set to `true`, metrics will be written directly to DynamoDB in real-time, instead of written to Amazon Kinesis for processing (recommend for testing only) | false |
157-
| `firehose_interval` | The [buffering](https://docs.aws.amazon.com/firehose/latest/dev/create-configure.html) interval in seconds at which the firehose will flush events to S3. | 60 |
191+
| `delivery_sync` | When`true` metrics will be written directly to DynamoDB, instead of the Amazon Kinesis for processing. | false |
192+
| `firehose_interval` | The [buffering](https://docs.aws.amazon.com/firehose/latest/dev/create-configure.html) interval in seconds which firehose will flush events to S3. | 60 |
158193
| `firehose_mb_size` | The buffering size in MB before the firehose will flush its events to S3. | 1 |
159194
| `log_level` | Logging level for AWS Lambda functions | "INFO" |
160195

161196
Run the following command to deploy the API and testing infrastructure, optionally override context values.
162197

163198
```
164-
cdk deploy ab-testing-api
199+
cdk deploy ab-testing-api -c endpoint_prefix=ab-testing-pipeline
165200
```
166201

167202
This stack will ask you to confirm any changes, and output the `ApiEndpoint` which you will provide to the A/B Testing sample notebook.
@@ -320,6 +355,38 @@ With the Deployment Pipeline complete, you will be able to continue with the nex
320355
5. Plot the beta distributions of the course of the test.
321356
6. Calculate the statistical significance of the test.
322357

358+
## Running Cost
359+
360+
This section outlines cost considerations for running the A/B Testing Pipeline. Completing the pipeline will deploy an endpoint with 2 production variants which will cost less than $3 per day. Further cost breakdowns are below.
361+
362+
- **CodeBuild** – Charges per minute used. First 100 minutes each month come at no charge. For information on pricing beyond the first 100 minutes, see [AWS CodeBuild Pricing](https://aws.amazon.com/codebuild/pricing/).
363+
- **CodeCommit** – $1/month if you didn't opt to use your own GitHub repository.
364+
- **CodePipeline** – CodePipeline costs $1 per active pipeline* per month. Pipelines are free for the first 30 days after creation. More can be found at [AWS CodePipeline Pricing](https://aws.amazon.com/codepipeline/pricing/).
365+
- **SageMaker** – Prices vary based on EC2 instance usage for the Notebook Instances, Model Hosting, Model Training and Model Monitoring; each charged per hour of use. For more information, see [Amazon SageMaker Pricing](https://aws.amazon.com/sagemaker/pricing/).
366+
- The ten `ml.c5.4xlarge` *training jobs* run for approx 4 minutes at $0.81 an hour, and cost less than $1.
367+
- The two `ml.t2.medium` instances for production *hosting* endpoint costs 2 x $0.056 per hour, or $2.68 per day.
368+
- **S3** – Low cost, prices will vary depending on the size of the models/artifacts stored. The first 50 TB each month will cost only $0.023 per GB stored. For more information, see [Amazon S3 Pricing](https://aws.amazon.com/s3/pricing/).
369+
- **API Gateway** - Low cost, $1.29 for first 300 million requests. For more info see [Amazon API Gateway pricing](https://aws.amazon.com/api-gateway/pricing/)
370+
- **Lambda** - Low cost, $0.20 per 1 million request see [AWS Lambda Pricing](https://aws.amazon.com/lambda/pricing/).
371+
372+
## Cleaning Up
373+
374+
Once you have cleaned up the SageMaker Endpoints and Project as described in the [Sample Notebook](notebook/mab-reviews-helpfulness.ipynb), complete the clean up by deleting the **Service Catalog** and **API** resources with the AWS CDK:
375+
376+
1. Delete the Service Catalog Portfolio and Project Template
377+
378+
```
379+
cdk destroy ab-testing-service-catalog
380+
```
381+
382+
2. Delete the API and testing infrastructure
383+
384+
Before destroying the API stack, is is recommend you [empty](https://docs.aws.amazon.com/AmazonS3/latest/userguide/empty-bucket.html) and [delete](https://docs.aws.amazon.com/AmazonS3/latest/userguide/delete-bucket.html) the S3 Bucket that contains the S3 logs persisted by the Kinesis Firehose.
385+
386+
```
387+
cdk destroy ab-testing-api
388+
```
389+
323390
## Want to know more?
324391

325392
The [FAQ](FAQ.md) page has some answers to questions on the design principals of this sample.

deployment_pipeline/app.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@
5555
"ab-testing-sagemaker",
5656
deployment_config=deployment_config,
5757
project_name=project_name,
58+
project_id=project_id,
5859
endpoint_name=endpoint_name,
5960
tags=tags,
6061
)

deployment_pipeline/infra/model_registry.py

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
from datetime import datetime
33

44
import boto3
5+
from botocore.config import Config
56
from botocore.exceptions import ClientError
67

78
logger = logging.getLogger(__name__)
@@ -13,7 +14,45 @@ class ModelRegistry:
1314
"""
1415

1516
def __init__(self):
16-
self.sm_client = boto3.client("sagemaker")
17+
config = Config(retries={"max_attempts": 10, "mode": "standard"})
18+
self.sm_client = boto3.client("sagemaker", config=config)
19+
20+
def create_model_package_group(
21+
self,
22+
model_package_group_name: str,
23+
description: str,
24+
project_name: str,
25+
project_id: str,
26+
):
27+
"""
28+
Create the model package group if it doesn't exist.
29+
"""
30+
try:
31+
self.sm_client.create_model_package_group(
32+
ModelPackageGroupName=model_package_group_name,
33+
ModelPackageGroupDescription=description,
34+
Tags=[
35+
{"Key": "sagemaker:project-name", "Value": project_name},
36+
{"Key": "sagemaker:project-id", "Value": project_id},
37+
],
38+
)
39+
logger.info(f"Model package group {model_package_group_name} created")
40+
return True
41+
42+
except ClientError as e:
43+
error_code = e.response["Error"]["Code"]
44+
error_message = e.response["Error"]["Message"]
45+
if (
46+
error_code == "ValidationException"
47+
and "Model Package Group already exists" in error_message
48+
):
49+
logger.info(
50+
f"Model package group {model_package_group_name} already exists"
51+
)
52+
return False
53+
else:
54+
logger.error(error_message)
55+
raise Exception(error_message)
1756

1857
def get_latest_approved_packages(
1958
self,

deployment_pipeline/infra/sagemaker_stack.py

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ def __init__(
1919
construct_id: str,
2020
deployment_config: DeploymentConfig,
2121
project_name: str,
22+
project_id: str,
2223
endpoint_name: str,
2324
tags: list,
2425
**kwargs,
@@ -28,10 +29,22 @@ def __init__(
2829
# Define the package group names for champion and challenger
2930
champion_package_group = f"{project_name}-champion"
3031
challenger_package_group = f"{project_name}-challenger"
32+
challenger_creation_time: datetime = None
3133

32-
# Get the approved packages for the project
34+
# Create the model package groups if they don't exist
3335
registry = ModelRegistry()
34-
challenger_creation_time: datetime = None
36+
registry.create_model_package_group(
37+
champion_package_group,
38+
"Champion Models for A/B Testing",
39+
project_name,
40+
project_id,
41+
)
42+
registry.create_model_package_group(
43+
challenger_package_group,
44+
"Challenger Models for A/B Testing",
45+
project_name,
46+
project_id,
47+
)
3548

3649
# If we don't have a specific champion variant defined, get the latest approved
3750
if deployment_config.champion_variant_config is None:
@@ -99,7 +112,7 @@ def __init__(
99112
f"arn:aws:iam::{self.account}:role/service-role/AmazonSageMakerServiceCatalogProductsUseRole",
100113
)
101114

102-
# Add the challenger variant
115+
# Add the champion and challenger variants
103116
model_configs = [
104117
deployment_config.champion_variant_config
105118
] + deployment_config.challenger_variant_config

deployment_pipeline/infra/test_model_registry.py

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,53 @@ def get_package(version: int, creation_time: datetime = datetime.fromtimestamp(0
1717
}
1818

1919

20+
@pytest.mark.skip(reason="botocore.exceptions.ParamValidationError: fails with Tags")
21+
def test_create_model_package_group():
22+
# Create model registry
23+
registry = ModelRegistry()
24+
25+
with Stubber(registry.sm_client) as stubber:
26+
# Empty list with more
27+
expected_params = {
28+
"ModelPackageGroupDescription": "test package group",
29+
"ModelPackageGroupName": "test-package-group",
30+
"Tags": [
31+
{"Key": "sagemaker:project-name", "Value": "test-project-name"},
32+
{"Key": "sagemaker:project-id", "Value": "test-project-id"},
33+
],
34+
}
35+
expected_response = {
36+
"ModelPackageGroupArn": f"arn:aws:sagemaker:REGION:ACCOUNT:model-package-group/test-package-group",
37+
}
38+
stubber.add_response(
39+
"create_model_package_group", expected_response, expected_params
40+
)
41+
42+
# Second time, add the client error if this exists
43+
stubber.add_client_error(
44+
"create_model_package_group",
45+
"ValidationException",
46+
"Model Package Group already exists",
47+
expected_params=expected_params,
48+
)
49+
50+
created = registry.create_model_package_group(
51+
"test-package-group",
52+
"test package group",
53+
"test-project-name",
54+
"test-project-id",
55+
)
56+
assert created == True
57+
58+
created = registry.create_model_package_group(
59+
"test-package-group",
60+
"test package group",
61+
"test-project-name",
62+
"test-project-id",
63+
)
64+
assert created == False
65+
66+
2067
def test_get_latest_approved_model_packages():
2168
# Create model registry
2269
registry = ModelRegistry()

deployment_pipeline/register.py

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414

1515
# Load these from environment variables, that are passed into CodeBuild job from pipeline stack
1616
project_name = os.environ["SAGEMAKER_PROJECT_NAME"]
17+
project_id = os.environ["SAGEMAKER_PROJECT_ID"]
1718
stage_name = os.environ["STAGE_NAME"]
1819
register_lambda = os.environ["REGISTER_LAMBDA"]
1920

@@ -24,20 +25,25 @@
2425
# Get the config and include with endpoint to register this model
2526
with open(f"{stage_name}-config.json", "r") as f:
2627
j = json.load(f)
27-
event = json.dumps({
28-
'source': 'aws.sagemaker',
29-
'detail-type': 'SageMaker Endpoint State Change',
30-
'detail': {
31-
'EndpointName': endpoint_name,
32-
'EndpointStatus': 'IN_SERVICE',
33-
'Tags': {
34-
'ab-testing:enabled': 'true',
35-
'ab-testing:strategy': j.get('strategy', 'ThompsonSampling'),
36-
'ab-testing:epsilon': str(j.get('epsilon', 0.1)),
37-
'ab-testing:warmup': str(j.get('warmup', 0)),
38-
}
28+
event = json.dumps(
29+
{
30+
"source": "aws.sagemaker",
31+
"detail-type": "SageMaker Endpoint State Change",
32+
"detail": {
33+
"EndpointName": endpoint_name,
34+
"EndpointStatus": "IN_SERVICE",
35+
"Tags": {
36+
"sagemaker:project-name": project_name,
37+
"sagemaker:project-id": project_id,
38+
"sagemaker:deployment-stage": stage_name,
39+
"ab-testing:enabled": "true",
40+
"ab-testing:strategy": j.get("strategy", "ThompsonSampling"),
41+
"ab-testing:epsilon": str(j.get("epsilon", 0.1)),
42+
"ab-testing:warmup": str(j.get("warmup", 0)),
43+
},
44+
},
3945
}
40-
})
46+
)
4147
response = lambda_client.invoke(
4248
FunctionName=register_lambda,
4349
InvocationType="RequestResponse",
@@ -47,5 +53,5 @@
4753
# Print the result, and if not succesful raise error
4854
result = json.loads(response["Payload"].read())
4955
print(result)
50-
if result["statusCode"] != 200:
56+
if result["statusCode"] not in [200, 201]:
5157
raise Exception("Unexpected status code: {}".format(result["statusCode"]))

deployment_pipeline/setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
package_dir={"": "infra"},
1616
packages=setuptools.find_packages(where="infra"),
1717
install_requires=[
18-
"boto3==1.17.33",
18+
"boto3>=1.17.54",
1919
"aws-cdk.core==1.94.1",
2020
"aws-cdk.aws-iam==1.94.1",
2121
"aws-cdk.aws-sagemaker==1.94.1",
8.16 KB
Loading

infra/api_stack.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ def __init__(
145145
environment={
146146
"METRICS_TABLE": metrics_table.table_name,
147147
"DELIVERY_STREAM_NAME": delivery_stream_name,
148-
"DELIVERY_SYNC": "true" if delivery_sync else "false",
148+
"STAGE_NAME": stage_name,
149149
"LOG_LEVEL": log_level,
150150
"ENDPOINT_PREFIX": endpoint_prefix,
151151
},

0 commit comments

Comments
 (0)