feat: add bigquery-alloydb-insights setup scripts and readme#1290
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds setup scripts and documentation for Lab 2 of the Informed AI Roadshow, which involves provisioning AlloyDB, configuring BigQuery datasets and connections, and seeding telemetry and thermal data. The review feedback focuses on improving the robustness of the shell scripts, including enabling pipefail to catch pipeline failures, validating critical variables like PROJECT_ID, CURRENT_USER, and SA_EMAIL to prevent cascading failures, ensuring curl fails on HTTP errors, and using consistent authentication commands.
| # Designed to be kicked off early in the lab and run in the background | ||
| # while attendees work through BigQuery tasks (~10-15 min to complete). | ||
| # --------------------------------------------------------------------------- | ||
| set -e |
There was a problem hiding this comment.
Using only set -e does not enable pipefail. In this script, the output of deploy_alloydb.sh is piped to tee (line 50). Without pipefail, if deploy_alloydb.sh fails, the pipeline's exit status will still be 0 (from tee), causing the script to continue running on a non-existent or failed cluster. Enabling set -euo pipefail ensures the script exits immediately on any command or pipeline failure, and also catches unbound variables.
| set -e | |
| set -euo pipefail |
| echo " You can re-run this step later with: bq mk --connection ..." | ||
| # Note: BigQuery treats AlloyDB as a PostgreSQL Cloud SQL connection under the hood via the REST API. | ||
| curl -s -X POST \ | ||
| -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ |
There was a problem hiding this comment.
Using gcloud auth application-default print-access-token requires the user to have set up Application Default Credentials (ADC) locally or in Cloud Shell, which is typically not done in standard lab environments. Since the script already verifies active user credentials on line 28 using gcloud auth print-access-token, we should use the same command here to ensure consistency and avoid authentication failures.
| -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ | |
| -H "Authorization: Bearer $(gcloud auth print-access-token)" \ |
| # --- Lab-specific configuration --- | ||
| export CLUSTER_NAME="lost-cargo-cluster" | ||
| export INSTANCE_NAME="lost-cargo-instance" | ||
| export PROJECT_ID="${PROJECT_ID:-$(gcloud config get-value project)}" |
There was a problem hiding this comment.
If PROJECT_ID is not set in the environment and gcloud config get-value project returns empty, the script will proceed with an empty PROJECT_ID. This will cause subsequent commands (like GCS bucket IAM binding and API calls) to fail with cryptic errors. We should validate that PROJECT_ID is non-empty and fail explicitly if it is missing, matching the behavior in setup_lab.sh.
| export PROJECT_ID="${PROJECT_ID:-$(gcloud config get-value project)}" | |
| export PROJECT_ID="${PROJECT_ID:-$(gcloud config get-value project 2>/dev/null)}" | |
| if [[ -z "${PROJECT_ID}" ]]; then | |
| echo "ERROR: No project set. Run 'gcloud config set project YOUR_PROJECT_ID' first." >&2 | |
| exit 1 | |
| fi |
References
- When an environment variable is required for the application to function correctly and has no sensible default, do not provide a default value. Instead, enforce that the user must set it, allowing the application to fail explicitly if it's missing.
| curl -X PATCH \ | ||
| -H "Authorization: Bearer $(gcloud auth print-access-token)" \ | ||
| -H "Content-Type: application/json" \ | ||
| "https://alloydb.googleapis.com/v1beta/projects/${PROJECT_ID}/locations/${REGION}/clusters/${CLUSTER_NAME}/instances/${INSTANCE_NAME}?updateMask=dataApiAccess" \ | ||
| -d '{ "dataApiAccess": "ENABLED" }' || echo "⚠️ Warning: Failed to enable Data API access." |
There was a problem hiding this comment.
By default, curl does not return a non-zero exit code when the server returns an HTTP error (e.g., 4xx or 5xx). As a result, the || echo "⚠️ Warning: ..." fallback will never be triggered if the API call fails. Adding the -f / --fail flag ensures curl fails on HTTP errors so the warning is correctly printed.
| curl -X PATCH \ | |
| -H "Authorization: Bearer $(gcloud auth print-access-token)" \ | |
| -H "Content-Type: application/json" \ | |
| "https://alloydb.googleapis.com/v1beta/projects/${PROJECT_ID}/locations/${REGION}/clusters/${CLUSTER_NAME}/instances/${INSTANCE_NAME}?updateMask=dataApiAccess" \ | |
| -d '{ "dataApiAccess": "ENABLED" }' || echo "⚠️ Warning: Failed to enable Data API access." | |
| curl -s -f -X PATCH \ | |
| -H "Authorization: Bearer $(gcloud auth print-access-token)" \ | |
| -H "Content-Type: application/json" \ | |
| "https://alloydb.googleapis.com/v1beta/projects/${PROJECT_ID}/locations/${REGION}/clusters/${CLUSTER_NAME}/instances/${INSTANCE_NAME}?updateMask=dataApiAccess" \ | |
| -d '{ "dataApiAccess": "ENABLED" }' || echo "⚠️ Warning: Failed to enable Data API access." |
| CURRENT_USER=$(gcloud config get-value account) | ||
| echo "[4/5] Creating database user for current IAM principal: ${CURRENT_USER}..." | ||
| gcloud alloydb users create "${CURRENT_USER}" \ | ||
| --cluster="${CLUSTER_NAME}" \ | ||
| --region="${REGION}" \ | ||
| --type=IAM_BASED \ | ||
| --superuser=true || echo "⚠️ Warning: Failed to create database user for ${CURRENT_USER}. You may need to create it manually." |
There was a problem hiding this comment.
If gcloud config get-value account returns empty or fails, CURRENT_USER will be empty, leading to an invalid gcloud alloydb users create command. We should check if CURRENT_USER is non-empty before attempting to create the database user.
| CURRENT_USER=$(gcloud config get-value account) | |
| echo "[4/5] Creating database user for current IAM principal: ${CURRENT_USER}..." | |
| gcloud alloydb users create "${CURRENT_USER}" \ | |
| --cluster="${CLUSTER_NAME}" \ | |
| --region="${REGION}" \ | |
| --type=IAM_BASED \ | |
| --superuser=true || echo "⚠️ Warning: Failed to create database user for ${CURRENT_USER}. You may need to create it manually." | |
| CURRENT_USER=$(gcloud config get-value account 2>/dev/null || echo "") | |
| if [[ -n "$CURRENT_USER" ]]; then | |
| echo "[4/5] Creating database user for current IAM principal: ${CURRENT_USER}..." | |
| gcloud alloydb users create "${CURRENT_USER}" \ | |
| --cluster="${CLUSTER_NAME}" \ | |
| --region="${REGION}" \ | |
| --type=IAM_BASED \ | |
| --superuser=true || echo "⚠️ Warning: Failed to create database user for ${CURRENT_USER}. You may need to create it manually." | |
| else | |
| echo "⚠️ Warning: Could not retrieve current IAM principal. Skipping database user creation." | |
| fi |
| SA_EMAIL=$(bq show --format=prettyjson --connection us.lost_cargo_conn \ | ||
| | grep "serviceAccountId" | cut -d '"' -f 4) | ||
| echo " Connection service account: $SA_EMAIL" | ||
| echo " Waiting 10s for service account to propagate..." | ||
| sleep 10 |
There was a problem hiding this comment.
If the connection creation failed or if bq show fails to retrieve the service account email, SA_EMAIL will be empty. This will cause the subsequent gcloud projects add-iam-policy-binding commands to fail with invalid member errors. We should check if SA_EMAIL is non-empty and fail explicitly if it is missing, similar to how SA_EMAIL_ALLOYDB is handled on line 101.
| SA_EMAIL=$(bq show --format=prettyjson --connection us.lost_cargo_conn \ | |
| | grep "serviceAccountId" | cut -d '"' -f 4) | |
| echo " Connection service account: $SA_EMAIL" | |
| echo " Waiting 10s for service account to propagate..." | |
| sleep 10 | |
| SA_EMAIL=$(bq show --format=prettyjson --connection us.lost_cargo_conn \ | |
| | grep "serviceAccountId" | cut -d '"' -f 4) | |
| if [[ -z "$SA_EMAIL" ]]; then | |
| echo "ERROR: Failed to retrieve service account for connection us.lost_cargo_conn." >&2 | |
| exit 1 | |
| fi | |
| echo " Connection service account: $SA_EMAIL" | |
| echo " Waiting 10s for service account to propagate..." | |
| sleep 10 |
References
- In shell scripts, prefer using standard tools like
grepandcutfor parsing command output over tools likejqif the latter's availability or functionality in the execution environment is not guaranteed.
No description provided.