Update notebook to approve winning verison, and added model package group image to operation manual

brightsparc · brightsparc · commit a85b084d3428 · 2021-04-18T18:30:24.000+10:00
diff --git a/OPERATIONS.md b/OPERATIONS.md
@@ -4,8 +4,7 @@ Having created the A/B Testing Deployment Pipeline, this operations manual provi
 
 ## A/B Testing for Machine Learning models
 
-Successful A/B Testing for machine learning models requires measuring how effective predictions are against end users.  
-It is important to be able to identify users consistently and be able to attribute success actions against the model predictions back to users.
+Successful A/B Testing for machine learning models requires measuring how effective predictions are against end users.   It is important to be able to identify users consistently and be able to attribute success actions against the model predictions back to users.
 
 ### Conversion Metrics
 
@@ -46,7 +45,11 @@ The configuration is stored in the CodeCommit source repository by stage name eg
 * `epsilon` - The epsilon parameter used by the `EpsilonGreedy` strategy.
 * `warmup` - The number of invocations to warm up before applying the strategy.
 
-In addition to the above, you must specify the `champion` and `challenger` model variants for the deployment.
+In addition to the above, you must specify the `champion` and `challenger` model variants for the deployment.  
+
+These will be loaded from the two Model Package Groups in the registry that include the project name and suffixed with `champion` or `challenger` for example project name `ab-testing-pipeline` these model package groups in the sample notebook:
+
+![\[Model Registry\]](docs/ab-testing-pipeline-model-registry.png)
 
 **Latest Approved Versions**
 
diff --git a/docs/ab-testing-pipeline-model-registry.png b/docs/ab-testing-pipeline-model-registry.png
diff --git a/notebook/mab-reviews-helpfulness.ipynb b/notebook/mab-reviews-helpfulness.ipynb
@@ -286,7 +286,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Create new pandas series that concat the label and tokenize values (this should less than 2 minutes)"
+    "Create new pandas series that concatenates the label and tokenize values (this should less than 2 minutes)"
    ]
   },
   {
@@ -587,7 +587,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now, we can create the pipeline\n"
+    "Now, we can create the pipeline."
    ]
   },
   {
@@ -792,13 +792,13 @@
     "    latest_model_package_arn = package['ModelPackageArn']\n",
     "    model_package_version = latest_model_package_arn.split('/')[-1]\n",
     "    if package['ModelApprovalStatus'] == 'PendingManualApproval':\n",
-    "        print(f\"Approving Version: {model_package_version}\")\n",
+    "        print(f\"Approving Champion Version: {model_package_version}\")\n",
     "        model_package_update_response = sm_client.update_model_package(\n",
     "            ModelPackageArn=latest_model_package_arn,\n",
     "            ModelApprovalStatus=\"Approved\",\n",
     "        )\n",
     "    else:\n",
-    "        print(f\"Model Version: {model_package_version} approved\")"
+    "        print(f\"Champion Version: {model_package_version} approved\")"
    ]
   },
   {
@@ -900,7 +900,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "max_jobs = 15\n",
+    "max_jobs = 9\n",
     "objective_name = 'validation:accuracy'\n",
     "tuner = HyperparameterTuner(estimator, \n",
     "                            objective_name,\n",
@@ -1091,7 +1091,7 @@
     ")\n",
     "\n",
     "model_package_version = model_package.model_package_arn.split('/')[-1]\n",
-    "print(f\"Registered and Approved Version: {model_package_version}\")"
+    "print(f\"Registered and Approved Challenger Version: {model_package_version}\")"
    ]
   },
   {
@@ -1391,8 +1391,9 @@
     "    r = requests.put(rest_api, data=json.dumps(payload))\n",
     "    return r.status_code, r.json() # Returns 201 if new, or 200 if update\n",
     "\n",
-    "def api_invocation(text_array):\n",
+    "def api_invocation(user_id, text_array):\n",
     "    payload = {\n",
+    "        \"user_id\": user_id,\n",
     "        \"endpoint_name\": endpoint_name, \n",
     "        \"content_type\": \"application/json\",\n",
     "        \"data\": json.dumps({\"instances\" : text_array, \"configuration\": { \"k\": 1 }}), \n",
@@ -1484,7 +1485,7 @@
    "outputs": [],
    "source": [
     "def api_predict(i):\n",
-    "    result = api_invocation(input_batch[i])\n",
+    "    result = api_invocation(i, input_batch[i])\n",
     "    if 'predictions' in result:\n",
     "        predictions = parse_predictions(result['predictions'])    \n",
     "        # Join predictions with test results\n",
@@ -1686,9 +1687,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Calling the winner\n",
+    "## Calling the winner\n",
     "\n",
-    "Assuming a normal distribution, lets evaluate a confidence score for the best performing variant."
+    "### Evaluate if statistically significant\n",
+    "\n",
+    "Assuming a normal distribution, let's evaluate a confidence score for the best performing variant."
    ]
   },
   {
@@ -1733,6 +1736,34 @@
     "conf"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Promote Model\n",
+    "\n",
+    "If our new estimator is a winning model, we can register that model in the `Champion` model package group in the registry to trigger a new deployment with this single variant."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_package = best_estimator.register(\n",
+    "    content_types=[\"text/plain\"],\n",
+    "    response_types=[\"text/csv\"],\n",
+    "    inference_instances=[\"ml.t2.medium\", \"ml.m5.xlarge\"],\n",
+    "    transform_instances=[\"ml.m5.xlarge\"],\n",
+    "    model_package_group_name=champion_model_group,\n",
+    "    approval_status=\"Approved\"\n",
+    ")\n",
+    "\n",
+    "model_package_version = model_package.model_package_arn.split('/')[-1]\n",
+    "print(f\"Registered and Approved Champion Version: {model_package_version}\")"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -1825,4 +1856,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 4
-}
+}