Skip to content

fix(bakery): fix demographics table column shift and define dataset description#52

Closed
n-issei-777 wants to merge 1 commit into
google:mainfrom
n-issei-777:fix/bigquery-setup-schema
Closed

fix(bakery): fix demographics table column shift and define dataset description#52
n-issei-777 wants to merge 1 commit into
google:mainfrom
n-issei-777:fix/bigquery-setup-schema

Conversation

@n-issei-777
Copy link
Copy Markdown
Contributor

Description

This PR fixes a data alignment bug and a missing environment/script variable definition in the launchmybakery BigQuery setup script.

1. Fix Demographics Table Column Shift (Data Corruption)

  • The Issue:
    The source CSV file data/demographics.csv contains 8 columns in the following order:
    zip_code,city,neighborhood,median_household_income,total_population,median_age,bachelors_degree_pct,foot_traffic_index
    However, the CREATE TABLE schema in setup_bigquery.sh only defined 7 columns, completely omitting the 4th column median_household_income.
    Because bq load performs a positional mapping for CSV imports, and --ignore_unknown_values=true was specified, the data was shifted left sequentially:

  • median_household_income (4th CSV column) was loaded into total_population (4th schema column).

  • total_population was loaded into median_age (causing values like 33626 to be stored as the age).

  • median_age was loaded into bachelors_degree_pct.

  • bachelors_degree_pct was loaded into foot_traffic_index.

  • The actual foot_traffic_index (8th CSV column) was entirely ignored and lost.

  • The Fix:
    Added median_household_income INT64 in the 4th position of the CREATE TABLE query inside setup_bigquery.sh to perfectly align with the CSV columns and restore data integrity.

2. Define Dataset Description Variable

  • The Issue:
    The bq mk command uses --description "$DATASET_DESCRIPTION", but the variable $DATASET_DESCRIPTION was never initialized anywhere in the script, resulting in an empty dataset description.
  • The Fix:
    Defined DATASET_DESCRIPTION="Dataset for MCP Bakery Demo" at the top of the script.

@n-issei-777
Copy link
Copy Markdown
Contributor Author

Resolved by #53, so this PR is no longer needed.
Closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant