From f8cb55d1ec620b22bf9b6c9b7f88f5fdfcd03fd0 Mon Sep 17 00:00:00 2001 From: Fabian Hirschmann Date: Thu, 4 Sep 2025 07:39:45 +0000 Subject: [PATCH 01/23] start work on new bq ui --- docs/labs/2_data_ingestion.md | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/docs/labs/2_data_ingestion.md b/docs/labs/2_data_ingestion.md index c187c6f..6ef4179 100644 --- a/docs/labs/2_data_ingestion.md +++ b/docs/labs/2_data_ingestion.md @@ -43,11 +43,12 @@ echo $CONN_SERVICE_ACCOUNT ``` Let's double check the service account. - + 1. Go to the [BigQuery Console](https://console.cloud.google.com/bigquery). -2. Expand {{ PROJECT_ID }} -3. Expand External connections -4. Click ``us.fraud-transactions-conn``. +2. Click Explorer +3. Expand {{ PROJECT_ID }} +4. Click External connections +5. Click fraud-transactions-conn Is the service account equivalent to the one you got from the command line? @@ -62,10 +63,10 @@ gcloud storage buckets add-iam-policy-binding gs://{{ PROJECT_ID }}-bucket \ Let's create a data set that contains the table and the external connection to Cloud Storage. 1. Go to the [BigQuery Console](https://console.cloud.google.com/bigquery) -2. Click the three vertical dots ⋮ next to `{{ PROJECT_ID }}` in the navigation menu -3. Click Create dataset +2. Hover your mouse over {{ PROJECT_ID }} +3. Click the three vertical dots (⋮) and go to `Create dataset` 4. Enter `ml_datasets` (plural) in the ID field. Region should be multi-region US. -5. Click Create dataset +5. Click `Create dataset` Alternatively, you can create the data set on the command line: ```bash @@ -99,10 +100,12 @@ bq mk --table \ Let's have a look at the data set: 1. Go to the [BigQuery Console](https://console.cloud.google.com/bigquery) -2. Expand {{ PROJECT_ID }} -3. Expand ml_datasets -4. Click ``ulb_fraud_detection_biglake`` -5. Click DETAILS +2. Expand {{ PROJECT_ID }} +3. Click Datasets +4. Click ml_datasets +5. Click Tables +6. Click ulb_fraud_detection_biglake +7. Click DETAILS Have a look at the external data configuration. You can see the Cloud Storage bucket (`gs://...`) your data lives in. From 63bec7455a83f5bc0913e116b12300aa75e43a6f Mon Sep 17 00:00:00 2001 From: Fabian Hirschmann Date: Thu, 4 Sep 2025 07:46:58 +0000 Subject: [PATCH 02/23] bigquery ui changed once again --- docs/labs/2_data_ingestion.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/docs/labs/2_data_ingestion.md b/docs/labs/2_data_ingestion.md index 6ef4179..6711942 100644 --- a/docs/labs/2_data_ingestion.md +++ b/docs/labs/2_data_ingestion.md @@ -103,9 +103,8 @@ Let's have a look at the data set: 2. Expand {{ PROJECT_ID }} 3. Click Datasets 4. Click ml_datasets -5. Click Tables -6. Click ulb_fraud_detection_biglake -7. Click DETAILS +5. Click ulb_fraud_detection_biglake +6. Click DETAILS Have a look at the external data configuration. You can see the Cloud Storage bucket (`gs://...`) your data lives in. From f7517c30f23cf2f06af2781c4994700adb6e0e7a Mon Sep 17 00:00:00 2001 From: Fabian Hirschmann Date: Thu, 9 Oct 2025 09:09:37 +0000 Subject: [PATCH 03/23] fix dataplex spotlights --- docs/labs/5_dataplex.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/docs/labs/5_dataplex.md b/docs/labs/5_dataplex.md index ab2fa81..f07600c 100644 --- a/docs/labs/5_dataplex.md +++ b/docs/labs/5_dataplex.md @@ -148,9 +148,8 @@ You can filter the data to be scanned for profiling by using row filters and col Dataplex lets you specify a percentage of records from your data to sample for running a data profiling scan. Creating data profiling scans on a smaller sample of data can reduce the execution time and cost of querying the entire dataset. Let's get started: - -1. Go to the Profile section in Dataplex. -2. Click + CREATE DATA PROFILE SCAN +1. Go to the Data profiling & quality section in Dataplex. +2. Click Create data profile scan 3. Set Display Name to `bootkon-profile-fraud-prediction` for example 4. Optionally add a description. For example, "data profile scans for fraud detection predictions" 5. Leave the “Browse within Dataplex Lakes” option turned off @@ -223,9 +222,9 @@ Creating and using a data quality scan consists of the following steps: **Lab Instructions** -1. Go to the [Data Quality](https://console.cloud.google.com/dataplex/govern/quality) section in the left hand menu of Dataplex. +1. Go to the Data profiling & quality section in the left hand menu of Dataplex. -2. Click on + CREATE DATA QUALITY SCAN +2. Click on Create data quality scan 3. Display Name: `bootkon-dquality-fraud-prediction` for example 4. Optionally add a description. For example, "data quality scans for fraud detection predictions" 5. Leave the "Browse within Dataplex Lakes" option turned off From 82e50dc490a45da81f8bdeb486cf7964d9c1ac0f Mon Sep 17 00:00:00 2001 From: Fabian Hirschmann Date: Thu, 9 Oct 2025 11:36:18 +0200 Subject: [PATCH 04/23] Update bk --- .scripts/bk | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/.scripts/bk b/.scripts/bk index 8096ccd..bdbd177 100755 --- a/.scripts/bk +++ b/.scripts/bk @@ -70,8 +70,6 @@ export BK_INITIALIZED=1 cd ~/ -pip install --quiet jinja2 nbformat nbconvert - if ! command -v git &> /dev/null; then sudo apt update sudo apt install -y git @@ -156,6 +154,8 @@ grep -qxF "$line" ~/.bashrc || echo "$line" >> ~/.bashrc unset line +pip install --quiet jinja2 + echo echo -e " __ --------------------------------------------------------" echo -e " _(\ |${RED}@@${NC}| | |" @@ -171,4 +171,4 @@ echo if [ "$(basename ${BASH_SOURCE[0]})" != "bk" ]; then # This script is run the first time from GitHub bk-start -fi \ No newline at end of file +fi From a91b87399fd72c7a5ff2371d4bb6c8cebdca89ab Mon Sep 17 00:00:00 2001 From: Fabian Hirschmann Date: Tue, 14 Oct 2025 09:18:36 +0200 Subject: [PATCH 05/23] Update bk --- .scripts/bk | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.scripts/bk b/.scripts/bk index bdbd177..d0523bf 100755 --- a/.scripts/bk +++ b/.scripts/bk @@ -70,6 +70,8 @@ export BK_INITIALIZED=1 cd ~/ +pip install --quiet jinja2 nbformat nbconvert + if ! command -v git &> /dev/null; then sudo apt update sudo apt install -y git @@ -154,8 +156,6 @@ grep -qxF "$line" ~/.bashrc || echo "$line" >> ~/.bashrc unset line -pip install --quiet jinja2 - echo echo -e " __ --------------------------------------------------------" echo -e " _(\ |${RED}@@${NC}| | |" From 2f728c22a50d24d5ddba128f55c666a65b49bea4 Mon Sep 17 00:00:00 2001 From: Fabian Hirschmann Date: Tue, 21 Oct 2025 11:13:53 +0200 Subject: [PATCH 06/23] remove pip install --- .scripts/bk | 2 -- 1 file changed, 2 deletions(-) diff --git a/.scripts/bk b/.scripts/bk index d0523bf..98fca01 100755 --- a/.scripts/bk +++ b/.scripts/bk @@ -70,8 +70,6 @@ export BK_INITIALIZED=1 cd ~/ -pip install --quiet jinja2 nbformat nbconvert - if ! command -v git &> /dev/null; then sudo apt update sudo apt install -y git From b5354a9673fed36fc0a85c9f423d6073a9dfdf88 Mon Sep 17 00:00:00 2001 From: Fabian Hirschmann Date: Tue, 21 Oct 2025 11:15:00 +0200 Subject: [PATCH 07/23] Add nbformat and nbconvert imports for rendering Import nbformat and nbconvert modules for Jupyter rendering. --- .scripts/bk-render-jinja2 | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/.scripts/bk-render-jinja2 b/.scripts/bk-render-jinja2 index 197d86a..2f223c2 100755 --- a/.scripts/bk-render-jinja2 +++ b/.scripts/bk-render-jinja2 @@ -6,10 +6,8 @@ import json import sys import os import re -import nbformat import base64 from functools import partial -from nbconvert import HTMLExporter, MarkdownExporter import jinja2 @@ -42,6 +40,9 @@ def apply_to_content(data, func): def render_jupyter(path): + import nbformat + from nbconvert import HTMLExporter, MarkdownExporter + with open(path) as f: nb = nbformat.read(f, as_version=4) exporter = HTMLExporter() From 97c5e8378321304a5341022f3e2a6152f886292e Mon Sep 17 00:00:00 2001 From: Cary Edwards Date: Tue, 21 Oct 2025 09:54:32 +0000 Subject: [PATCH 08/23] Change name of external connection --- docs/labs/2_data_ingestion.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/labs/2_data_ingestion.md b/docs/labs/2_data_ingestion.md index 6711942..a9fec3c 100644 --- a/docs/labs/2_data_ingestion.md +++ b/docs/labs/2_data_ingestion.md @@ -47,7 +47,7 @@ Let's double check the service account. 1. Go to the [BigQuery Console](https://console.cloud.google.com/bigquery). 2. Click Explorer 3. Expand {{ PROJECT_ID }} -4. Click External connections +4. Click Connections 5. Click fraud-transactions-conn Is the service account equivalent to the one you got from the command line? From 99536b18f7191d96ed2983247b56a6c4a45f7183 Mon Sep 17 00:00:00 2001 From: Cary Edwards Date: Tue, 21 Oct 2025 10:28:43 +0000 Subject: [PATCH 09/23] bq UI changes explorer --- docs/labs/2_data_ingestion.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/docs/labs/2_data_ingestion.md b/docs/labs/2_data_ingestion.md index a9fec3c..186c181 100644 --- a/docs/labs/2_data_ingestion.md +++ b/docs/labs/2_data_ingestion.md @@ -74,10 +74,11 @@ bq --location=us mk -d ml_datasets ``` Next, we connect the data in Cloud Storage to BigQuery: -1. Click + Add data -2. Click Google Cloud Storage -3. Select `Load to BigQuery` -4. Enter the following details: +1. Choose Explorer +2. Click + Add data +3. Click Google Cloud Storage +4. Select `Load to BigQuery` +5. Enter the following details: - Create table from: `Google Cloud Storage` - Select file: `{{ PROJECT_ID }}-bucket/data/parquet/ulb_fraud_detection/*` - File format: `Parquet` @@ -88,7 +89,7 @@ Next, we connect the data in Cloud Storage to BigQuery: - Check *Create a BigLake table using a Cloud Resource connection* - Connection ID: Select `us.fraud-transactions-conn` - Schema: `Auto detect` -5. Click on Create table +6. Click on Create table Alternatively, you can also use the command line to create the table: From cfc06b7e9ef521aca2e4debf5a65d5ad96f1b794 Mon Sep 17 00:00:00 2001 From: Cary Edwards Date: Tue, 21 Oct 2025 10:35:35 +0000 Subject: [PATCH 10/23] bq UI changes explorer --- docs/labs/2_data_ingestion.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/docs/labs/2_data_ingestion.md b/docs/labs/2_data_ingestion.md index 186c181..af84c2b 100644 --- a/docs/labs/2_data_ingestion.md +++ b/docs/labs/2_data_ingestion.md @@ -100,12 +100,13 @@ bq mk --table \ ``` Let's have a look at the data set: -1. Go to the [BigQuery Console](https://console.cloud.google.com/bigquery) -2. Expand {{ PROJECT_ID }} -3. Click Datasets -4. Click ml_datasets -5. Click ulb_fraud_detection_biglake -6. Click DETAILS +1. Choose Explorer +2. Go to the [BigQuery Console](https://console.cloud.google.com/bigquery) +3. Expand {{ PROJECT_ID }} +4. Click Datasets +5. Click ml_datasets +6. Click ulb_fraud_detection_biglake +7. Click DETAILS Have a look at the external data configuration. You can see the Cloud Storage bucket (`gs://...`) your data lives in. From 5441ea00d4fa7b121de14101f9b3707e8e75cb59 Mon Sep 17 00:00:00 2001 From: Cary Edwards Date: Tue, 21 Oct 2025 10:37:08 +0000 Subject: [PATCH 11/23] bq UI changes explorer --- docs/labs/2_data_ingestion.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/labs/2_data_ingestion.md b/docs/labs/2_data_ingestion.md index af84c2b..452ad01 100644 --- a/docs/labs/2_data_ingestion.md +++ b/docs/labs/2_data_ingestion.md @@ -100,8 +100,8 @@ bq mk --table \ ``` Let's have a look at the data set: -1. Choose Explorer -2. Go to the [BigQuery Console](https://console.cloud.google.com/bigquery) +1. Go to the [BigQuery Console](https://console.cloud.google.com/bigquery) +2. Choose Explorer 3. Expand {{ PROJECT_ID }} 4. Click Datasets 5. Click ml_datasets From 9abc4448c7d286ee01f7a98dba39abbd3dc92bfc Mon Sep 17 00:00:00 2001 From: Cary Edwards Date: Tue, 21 Oct 2025 10:52:00 +0000 Subject: [PATCH 12/23] bq UI changes explorer --- docs/labs/2_data_ingestion.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/labs/2_data_ingestion.md b/docs/labs/2_data_ingestion.md index 452ad01..9f03568 100644 --- a/docs/labs/2_data_ingestion.md +++ b/docs/labs/2_data_ingestion.md @@ -63,10 +63,11 @@ gcloud storage buckets add-iam-policy-binding gs://{{ PROJECT_ID }}-bucket \ Let's create a data set that contains the table and the external connection to Cloud Storage. 1. Go to the [BigQuery Console](https://console.cloud.google.com/bigquery) -2. Hover your mouse over {{ PROJECT_ID }} -3. Click the three vertical dots (⋮) and go to `Create dataset` -4. Enter `ml_datasets` (plural) in the ID field. Region should be multi-region US. -5. Click `Create dataset` +2. Choose Explorer +3. Hover your mouse over {{ PROJECT_ID }} +4. Click the three vertical dots (⋮) and go to `Create dataset` +5. Enter `ml_datasets` (plural) in the ID field. Region should be multi-region US. +6. Click `Create dataset` Alternatively, you can create the data set on the command line: ```bash From aa3e12f2b1fd5350a13d100fed525a151a22a8cb Mon Sep 17 00:00:00 2001 From: Cary Edwards Date: Tue, 21 Oct 2025 11:07:40 +0000 Subject: [PATCH 13/23] label changes in bq Details and query --- docs/labs/2_data_ingestion.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/labs/2_data_ingestion.md b/docs/labs/2_data_ingestion.md index 9f03568..214903d 100644 --- a/docs/labs/2_data_ingestion.md +++ b/docs/labs/2_data_ingestion.md @@ -107,14 +107,14 @@ Let's have a look at the data set: 4. Click Datasets 5. Click ml_datasets 6. Click ulb_fraud_detection_biglake -7. Click DETAILS +7. Click Details Have a look at the external data configuration. You can see the Cloud Storage bucket (`gs://...`) your data lives in. Let's query it: -1. Click QUERY +1. Click Query 2. Insert the following SQL query. ```sql From 06627ff73ff26a22ac152968beb16821dd964015 Mon Sep 17 00:00:00 2001 From: Cary Edwards Date: Tue, 21 Oct 2025 11:37:04 +0000 Subject: [PATCH 14/23] bq UI changes explorer --- docs/labs/3_dataform.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/labs/3_dataform.md b/docs/labs/3_dataform.md index fb32960..bc53b13 100644 --- a/docs/labs/3_dataform.md +++ b/docs/labs/3_dataform.md @@ -209,7 +209,7 @@ Go to [Dataform](https://console.cloud.google.com/bigquery/dataform)\> ``{{ PROJECT_ID }}`` \> External connections \> `fraud-transactions-conn` +1. You can find the service account ID under [BigQuery Studio](https://console.cloud.google.com/bigquery) \> Explorer \> ``{{ PROJECT_ID }}`` \> Connections \> `fraud-transactions-conn` serviceaccountconnection From 0bf66d72cb989c16a7206016e91ecc9a41bde167 Mon Sep 17 00:00:00 2001 From: Cary Edwards Date: Tue, 21 Oct 2025 11:41:03 +0000 Subject: [PATCH 15/23] IAM link --- docs/labs/3_dataform.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/labs/3_dataform.md b/docs/labs/3_dataform.md index bc53b13..1b3e440 100644 --- a/docs/labs/3_dataform.md +++ b/docs/labs/3_dataform.md @@ -213,7 +213,7 @@ For the sentiment inference step to succeed, you need to grant the external con serviceaccountconnection -2. Take note of the service account and grant it the `Vertex AI User` role. +2. Take note of the service account and grant it the `Vertex AI User` role in [IAM](https://console.cloud.google.com/iam-admin). vertexairole 3. Back in your [Dataform](https://console.cloud.google.com/bigquery/dataform) workspace, click Start execution from the top menu, then Execute Actions From 18da58ef84837896f112d9d17b2fc9fa8b28c4b0 Mon Sep 17 00:00:00 2001 From: Fabian Hirschmann Date: Tue, 21 Oct 2025 14:51:43 +0000 Subject: [PATCH 16/23] do not auto init bootkon --- .scripts/bk | 39 +++++++++++++++++---------------------- 1 file changed, 17 insertions(+), 22 deletions(-) diff --git a/.scripts/bk b/.scripts/bk index 98fca01..d1adf1f 100755 --- a/.scripts/bk +++ b/.scripts/bk @@ -65,7 +65,6 @@ export BK_REPO_URL="https://github.com/${BK_REPO}.git" export BK_TUTORIAL="${BK_TUTORIAL:-docs/TUTORIAL.md}" # defaults to .TUTORIAL.md; can be overwritten export BK_BRANCH="${BK_BRANCH:-main}" # defaults to main; can be overwritten export BK_DIR=~/${BK_GITHUB_REPOSITORY} -export BK_INIT_SCRIPT=~/${BK_GITHUB_REPOSITORY}/bk export BK_INITIALIZED=1 cd ~/ @@ -85,15 +84,29 @@ fi cd $BK_GITHUB_REPOSITORY NEW_PATH=~/${BK_GITHUB_REPOSITORY}/.scripts +PATH_EXPORT_LINE="export PATH=\${HOME}/${BK_GITHUB_REPOSITORY}/.scripts:\$PATH" -# Check if the new path is already in the PATH +# 1. Add to current session's PATH if missing if [[ ":$PATH:" != *":$NEW_PATH:"* ]]; then -echo -e "${MAGENTA}Adding $NEW_PATH to your PATH${NC}" + echo -e "${MAGENTA}Adding $NEW_PATH to your current session's PATH${NC}" export PATH=${NEW_PATH}:$PATH else - echo -e "${GREEN}Your PATH already contains $NEW_PATH. Not adding it again.${NC}" + echo -e "${GREEN}Your current session's PATH already contains $NEW_PATH. Not adding it again.${NC}" fi + +# 2. Persist the PATH setting in ~/.bashrc if not already there +if ! grep -qF "$PATH_EXPORT_LINE" ~/.bashrc ; then + echo -e "${MAGENTA}Adding $NEW_PATH to ~/.bashrc for future sessions.${NC}" + # Use '>>' to append the line to the file + echo "$PATH_EXPORT_LINE" >> ~/.bashrc +else + echo -e "${GREEN}The permanent PATH export for $NEW_PATH is already in ~/.bashrc. Skipping.${NC}" +fi + unset NEW_PATH +unset PATH_EXPORT_LINE + + echo -e "Sourcing $(readlink -f vars.sh)" source vars.sh @@ -136,24 +149,6 @@ else echo "$line" >> ~/.bashrc fi -## Set or update $BK_INIT_SCRIPT in ~/.bashrc -line="export BK_INIT_SCRIPT=~/${BK_GITHUB_REPOSITORY}/.scripts/bk" -if grep -q '^export BK_INIT_SCRIPT=' ~/.bashrc; then - # If the line exists but differs, update it - if ! grep -Fxq "$line" ~/.bashrc; then - sed -i "s|^export BK_INIT_SCRIPT=.*|$line|" ~/.bashrc - echo "Updated the existing BK_INIT_SCRIPT line in ~/.bashrc." - fi -else - echo "$line" >> ~/.bashrc -fi - -## Load $BK_INIT_SCRIPT in ~/.bashrc -line='if [ -f ${BK_INIT_SCRIPT} ]; then source ${BK_INIT_SCRIPT}; fi' -grep -qxF "$line" ~/.bashrc || echo "$line" >> ~/.bashrc - -unset line - echo echo -e " __ --------------------------------------------------------" echo -e " _(\ |${RED}@@${NC}| | |" From bef83126276e2b8717806d7cceb3b2efdb8f566f Mon Sep 17 00:00:00 2001 From: Fabian Hirschmann Date: Tue, 21 Oct 2025 15:18:18 +0000 Subject: [PATCH 17/23] do not auto init script and fix vertex ai perms --- .scripts/bk | 14 ++++++-------- .scripts/bk-bootstrap | 32 ++++++++++++++++++++++++++++++++ .scripts/bk-start | 8 ++++++++ 3 files changed, 46 insertions(+), 8 deletions(-) diff --git a/.scripts/bk b/.scripts/bk index d1adf1f..cea70b9 100755 --- a/.scripts/bk +++ b/.scripts/bk @@ -106,17 +106,15 @@ fi unset NEW_PATH unset PATH_EXPORT_LINE +echo -e "Sourcing $(readlink -f $BK_DIR/vars.sh)" +source $BK_DIR/vars.sh - -echo -e "Sourcing $(readlink -f vars.sh)" -source vars.sh - -if [ -f vars.local.sh ]; then - echo -e "Sourcing $(readlink -f vars.local.sh)" - source vars.local.sh +if [ -f $BK_DIR/vars.local.sh ]; then + echo -e "Sourcing $(readlink -f $BK_DIR/vars.local.sh)" + source $BK_DIR/vars.local.sh fi -echo -e "Variables from vars.sh: PROJECT_ID=${YELLOW}$PROJECT_ID${NC} GCP_USERNAME=${YELLOW}$GCP_USERNAME${NC} REGION=${YELLOW}$REGION${NC}" +echo -e "Variables from $BK_DIR/vars.sh: PROJECT_ID=${YELLOW}$PROJECT_ID${NC} GCP_USERNAME=${YELLOW}$GCP_USERNAME${NC} REGION=${YELLOW}$REGION${NC}" if [ -z $PROJECT_ID ]; then diff --git a/.scripts/bk-bootstrap b/.scripts/bk-bootstrap index 14b0aae..c19399a 100755 --- a/.scripts/bk-bootstrap +++ b/.scripts/bk-bootstrap @@ -58,3 +58,35 @@ for role in "${service_account_roles[@]}"; do gcloud projects add-iam-policy-binding "$PROJECT_ID" \ --member="serviceAccount:$COMPUTE_SERVICE_ACCOUNT" --role="$role" >>/dev/null done + +declare -a vertex_service_account_roles=( + "roles/dataproc.worker" # Can perform actions as a Dataproc worker + "roles/bigquery.dataEditor" # Can edit BigQuery datasets + "roles/bigquery.jobUser" # Can run BigQuery jobs + "roles/storage.objectAdmin" # Admin on Cloud Storage objects + "roles/storage.admin" # Admin on Cloud Storage + "roles/storage.objectViewer" # Can view Cloud Storage objects + "roles/iam.serviceAccountUser" # Can use service accounts + "roles/pubsub.admin" # Admin on Pub/Sub + "roles/serviceusage.serviceUsageConsumer" # Can use services + "roles/artifactregistry.admin" # Artifact registry admin + "roles/resourcemanager.projectIamAdmin" # Project IAM admin + "roles/aiplatform.admin" # Vertex AI admin + "roles/cloudbuild.builds.editor" # Cloud Build editor +) + +VERTEX_AI_CC_SERVICE_ACCOUNT="service-$PROJECT_NUMBER@gcp-sa-aiplatform-cc.iam.gserviceaccount.com" + +# Array of roles to grant to the Vertex AI Custom Code Service Agent +declare -a vertex_cc_service_agent_roles=( + "roles/artifactregistry.reader" # AI Platform Artifact Registry Reader + "roles/artifactregistry.serviceAgent" # Custom Artifact Registry Service Agent + "roles/aiplatform.customCodeServiceAgent" # Vertex AI Custom Code Service Agent +) + +# Assign roles to the Vertex AI Custom Code Service Account +for role in "${vertex_cc_service_agent_roles[@]}"; do + echo "Assigning role $role to $VERTEX_AI_CC_SERVICE_ACCOUNT in project $PROJECT_ID..." + gcloud projects add-iam-policy-binding "$PROJECT_ID" \ + --member="serviceAccount:$VERTEX_AI_CC_SERVICE_ACCOUNT" --role="$role" >>/dev/null +done \ No newline at end of file diff --git a/.scripts/bk-start b/.scripts/bk-start index afc1bda..047e828 100755 --- a/.scripts/bk-start +++ b/.scripts/bk-start @@ -1,5 +1,13 @@ #!/bin/sh cd $BK_DIR + +if [ -z "$BK_INITIALIZED" ]; then + echo "Bootkon has not been initialized." + echo "Please execute: " + echo " . bk (including the dot)" + exit 1 +fi + bk-tutorial $BK_TUTORIAL cloudshell open-workspace . \ No newline at end of file From 46c239cb598027fc15ee659f7104d152f4f0901b Mon Sep 17 00:00:00 2001 From: Cary Edwards Date: Tue, 21 Oct 2025 15:30:20 +0000 Subject: [PATCH 18/23] enpoints label --- docs/labs/4_ml.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/labs/4_ml.md b/docs/labs/4_ml.md index 5791e8c..654222d 100644 --- a/docs/labs/4_ml.md +++ b/docs/labs/4_ml.md @@ -80,7 +80,7 @@ Here you can can see that a model in the Vertex AI Model Registry is made up fro The endpoint is created in a parallel branch in the pipeline you just ran. You can deploy models to an endpoint through the model registry. -1. Click Online Prediction in the navigation menu +1. Click Endpoints in the navigation menu 2. Click bootkon-endpoint You can see that the endpoint has one model deployed currently, and all the traffic is routed to it (traffic split is 100%). When scrolling down, you get live graphs as soon as predictions are coming in. From 3b7807c551d2843afbbe6cfa8365b621f7821f4d Mon Sep 17 00:00:00 2001 From: Fabian Hirschmann Date: Tue, 21 Oct 2025 15:33:15 +0000 Subject: [PATCH 19/23] remove superfluous roles --- .scripts/bk-bootstrap | 16 ---------------- 1 file changed, 16 deletions(-) diff --git a/.scripts/bk-bootstrap b/.scripts/bk-bootstrap index c19399a..eba167d 100755 --- a/.scripts/bk-bootstrap +++ b/.scripts/bk-bootstrap @@ -59,22 +59,6 @@ for role in "${service_account_roles[@]}"; do --member="serviceAccount:$COMPUTE_SERVICE_ACCOUNT" --role="$role" >>/dev/null done -declare -a vertex_service_account_roles=( - "roles/dataproc.worker" # Can perform actions as a Dataproc worker - "roles/bigquery.dataEditor" # Can edit BigQuery datasets - "roles/bigquery.jobUser" # Can run BigQuery jobs - "roles/storage.objectAdmin" # Admin on Cloud Storage objects - "roles/storage.admin" # Admin on Cloud Storage - "roles/storage.objectViewer" # Can view Cloud Storage objects - "roles/iam.serviceAccountUser" # Can use service accounts - "roles/pubsub.admin" # Admin on Pub/Sub - "roles/serviceusage.serviceUsageConsumer" # Can use services - "roles/artifactregistry.admin" # Artifact registry admin - "roles/resourcemanager.projectIamAdmin" # Project IAM admin - "roles/aiplatform.admin" # Vertex AI admin - "roles/cloudbuild.builds.editor" # Cloud Build editor -) - VERTEX_AI_CC_SERVICE_ACCOUNT="service-$PROJECT_NUMBER@gcp-sa-aiplatform-cc.iam.gserviceaccount.com" # Array of roles to grant to the Vertex AI Custom Code Service Agent From e4c74075673873dedc1a1adf9f6a699ea5c18eb1 Mon Sep 17 00:00:00 2001 From: Fabian Hirschmann Date: Tue, 21 Oct 2025 17:50:43 +0200 Subject: [PATCH 20/23] Comment out role assignment in bk-bootstrap script Comment out the role assignment loop for Vertex AI Custom Code Service Agent. --- .scripts/bk-bootstrap | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/.scripts/bk-bootstrap b/.scripts/bk-bootstrap index eba167d..90c9dde 100755 --- a/.scripts/bk-bootstrap +++ b/.scripts/bk-bootstrap @@ -69,8 +69,8 @@ declare -a vertex_cc_service_agent_roles=( ) # Assign roles to the Vertex AI Custom Code Service Account -for role in "${vertex_cc_service_agent_roles[@]}"; do - echo "Assigning role $role to $VERTEX_AI_CC_SERVICE_ACCOUNT in project $PROJECT_ID..." - gcloud projects add-iam-policy-binding "$PROJECT_ID" \ - --member="serviceAccount:$VERTEX_AI_CC_SERVICE_ACCOUNT" --role="$role" >>/dev/null -done \ No newline at end of file +#for role in "${vertex_cc_service_agent_roles[@]}"; do +# echo "Assigning role $role to $VERTEX_AI_CC_SERVICE_ACCOUNT in project $PROJECT_ID..." +# gcloud projects add-iam-policy-binding "$PROJECT_ID" \ +# --member="serviceAccount:$VERTEX_AI_CC_SERVICE_ACCOUNT" --role="$role" >>/dev/null +#done From 1e0e308f6c2f68c2a469ec833d13c43acf588bf3 Mon Sep 17 00:00:00 2001 From: Fabian Hirschmann Date: Wed, 22 Oct 2025 09:48:49 +0200 Subject: [PATCH 21/23] Update tutorial instructions for bootkon initialization Clarify instructions for initializing bootkon and restarting the tutorial. --- docs/TUTORIAL.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/TUTORIAL.md b/docs/TUTORIAL.md index a59447d..57db1db 100644 --- a/docs/TUTORIAL.md +++ b/docs/TUTORIAL.md @@ -38,19 +38,21 @@ and set `GCP_USERNAME`, `PROJECT_ID` according to the information you received. ❗ Please do not include any whitespaces when setting these variablers. -Please reload bootkon and make sure there are no errors printed: +Please initialize bootkon. The next command will set environment variables in your current terminal. ```bash . bk ``` - -And restart the tutorial using the next command. You can also use the next command to continue bootkon in case you accidentally close the tutorial or the editor: +Reload the tutorial window on the right-hand side of your screen. ```bash bk-start ``` +In case you accidently close the tutorial or the editor, you can execute `bk-start` to start it again. Please make sure that you execute `. bk` in every terminal +you open so that the environment variables are set. + Now, your * `PROJECT_ID` is `{% if PROJECT_ID == "" %}None{% else %}{{ PROJECT_ID }}{% endif %}` @@ -85,4 +87,4 @@ The authors of Data & AI Bootkon are: Data & AI Bootkon received contributions from many people, including: - [Christine Schulze](https://www.linkedin.com/in/christine-schulze-33822765/) - [Daniel Quinlan](https://www.linkedin.com/in/%F0%9F%8C%8Ddaniel-quinlan-51126016/) -- [Dinesh Sandra](https://www.linkedin.com/in/sandradinesh/) \ No newline at end of file +- [Dinesh Sandra](https://www.linkedin.com/in/sandradinesh/) From 6754ad741f07b8f3131a1a0cbe876571e4ec88d3 Mon Sep 17 00:00:00 2001 From: Daniel Quinlan Date: Wed, 22 Oct 2025 13:10:24 +0200 Subject: [PATCH 22/23] Locator fix for layout changes in Vertex AI --- docs/labs/4_ml.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/labs/4_ml.md b/docs/labs/4_ml.md index 654222d..de756e5 100644 --- a/docs/labs/4_ml.md +++ b/docs/labs/4_ml.md @@ -91,8 +91,8 @@ You can also train and deploy models on Vertex in the UI only. Let's have a more Let's have a look at the Pipeline as well. -1. Click Pipelines in the navigation menu -2. Click bootkon-pipeline-... +1. Click Pipelines in the navigation menu +2. Click bootkon-pipeline-... You can now see the individual steps in the pipeline. Please click through the individual steps of the pipeline and have a look at the *Pipeline run analysis* on the right hand side as you cycle pipeline steps. From 7aa35613f68a5a6a734b9acef03cd7278d61474e Mon Sep 17 00:00:00 2001 From: Daniel Quinlan Date: Wed, 22 Oct 2025 13:25:57 +0200 Subject: [PATCH 23/23] make explicit: deploy model after endpoint created --- src/ml/pipeline.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/ml/pipeline.py b/src/ml/pipeline.py index 6460f22..ccbf156 100644 --- a/src/ml/pipeline.py +++ b/src/ml/pipeline.py @@ -77,7 +77,7 @@ def pipeline( display_name="bootkon-endpoint", ) - ModelDeployOp( + model_deploy_op = ModelDeployOp( endpoint=endpoint_create_op.outputs["endpoint"], model=model_upload_op.outputs["model"], deployed_model_display_name="bootkon-endpoint", @@ -85,6 +85,7 @@ def pipeline( dedicated_resources_min_replica_count=1, dedicated_resources_max_replica_count=1 ) + model_deploy_op.after(endpoint_create_op) compiler.Compiler().compile(