Skip to content

fix(dbt): coalesce NULL county to 'Unspecified' at intermediate layer#93

Merged
SonaliBedge merged 5 commits into
mainfrom
sonali/mfb-1005-data-coalesce-null-county-at-the-mart-layer-to-eliminate
Jun 8, 2026
Merged

fix(dbt): coalesce NULL county to 'Unspecified' at intermediate layer#93
SonaliBedge merged 5 commits into
mainfrom
sonali/mfb-1005-data-coalesce-null-county-at-the-mart-layer-to-eliminate

Conversation

@SonaliBedge

@SonaliBedge SonaliBedge commented May 18, 2026

Copy link
Copy Markdown
Collaborator

Context & Motivation

county is nullable in screener_screen and that nullability propagated through every downstream dbt mart. Dashboard SQL had to work around it with IS NOT DISTINCT FROM, which is harder to read and prevents hash-based join optimization in Postgres. The affected white labels are co_tax_calculator (100% NULL — never collects county) and co (~0.2% NULL from pre-2024-10-14 data and residual FE validation gaps).

Changes Made

  • Added COALESCE(NULLIF(TRIM(ss.county), ''), 'Unspecified') in int_complete_screener_data — the single upstream source all five affected marts derive county from (mart_screener_data, mart_immediate_needs, mart_qualified_benefits, mart_previous_benefits, mart_householdmembers). No mart-level changes needed.
  • Added a not_null dbt test on mart_householdmembers.county and updated its description to document the 'Unspecified' placeholder.

Testing

  • dbt models to run: dbt run --select int_complete_screener_data
  • Terraform plan reviewed: N/A
  • Local Metabase tested via docker-compose: N/A
  • Other manual testing steps:
    Verify NULL county counts drop to 0 after run:
SELECT COUNT(*) FROM mart_householdmembers WHERE county IS NULL;

-- expect: 0

Deployment

  • Metabase container rebuild/redeploy needed: no
  • dbt production run needed (outside nightly cron): yes - run dbt build --select int_complete_screener_data+ to backfill all downstream marts immediately; otherwise NULL rows persist until the next nightly cron
  • Other post-deployment steps: None

Notes for Reviewers

  • The placeholder is Unspecified (consistent with the existing county fallback already present in int_complete_screener_data for empty strings). If the team prefers Unknown as the ticket suggested, that's a one-word change.

  • No dashboard SQL changes are required as a result of this PR, but any existing IS NOT DISTINCT FROM NULL county filters in Metabase can be simplified to = 'Unspecified ' after deployment

  • Known limitations: historical rows in co_tax_calculator will show Unspecified permanently - that white label never collects county by design.

  • Future considerations:

Summary by CodeRabbit

  • Chores
    • Normalized county values across datasets: whitespace trimmed and empty values converted to a standardized "Unspecified".
    • Added county column to several data marts and updated descriptions to reflect normalization.
    • Strengthened validation by enforcing non-null checks on county to ensure complete, consistent location data.

@coderabbitai

coderabbitai Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

.coderabbit.yaml has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)
Validation error: Unrecognized key: "base_branches"
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: MyFriendBen/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2ee23dbb-e339-437d-bc4a-5e63662d3451

📥 Commits

Reviewing files that changed from the base of the PR and between dc4c0d6 and cffdcd7.

📒 Files selected for processing (4)
  • dbt/models/postgres/intermediate/int_complete_screener_data.sql
  • dbt/models/postgres/marts/mart_householdmembers.yml
  • dbt/models/postgres/marts/mart_immediate_needs.yml
  • dbt/models/postgres/marts/mart_qualified_benefits.yml
🚧 Files skipped from review as they are similar to previous changes (2)
  • dbt/models/postgres/marts/mart_householdmembers.yml
  • dbt/models/postgres/intermediate/int_complete_screener_data.sql

📝 Walkthrough

Walkthrough

County values are normalized in the intermediate model (trim, empty→'Unspecified'), SQL function names/formats in the intermediate model are standardized, and downstream marts add county columns with not_null tests and updated descriptions.

Changes

County field data quality improvements

Layer / File(s) Summary
County field source normalization
dbt/models/postgres/intermediate/int_complete_screener_data.sql
The base_table_1 CTE normalizes county by trimming whitespace and converting empty strings to 'Unspecified' using COALESCE(NULLIF(TRIM(ss.county), ''), 'Unspecified').
SQL expression formatting in intermediate model
dbt/models/postgres/intermediate/int_complete_screener_data.sql
Non-functional formatting: uppercase SQL function names (TRIM, COALESCE, TO_CHAR, SUM) in partner and benefit aggregation expressions; logic unchanged.
Marts schema: add/validate county
dbt/models/postgres/marts/mart_householdmembers.yml, dbt/models/postgres/marts/mart_immediate_needs.yml, dbt/models/postgres/marts/mart_qualified_benefits.yml
Added or updated county column definitions with descriptions noting normalization to 'Unspecified' and added tests: [not_null] for each mart where applicable.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately reflects the main change: coalescing NULL county values to 'Unspecified' at the intermediate dbt layer to resolve the issue.
Description check ✅ Passed The PR description comprehensively covers all required template sections: context with issue link, specific changes made, testing steps, deployment instructions, and reviewer notes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sonali/mfb-1005-data-coalesce-null-county-at-the-mart-layer-to-eliminate

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dbt/models/postgres/marts/mart_householdmembers.yml`:
- Around line 29-31: Update the description for the county field in
mart_householdmembers (the county column metadata) to indicate that null or
blank source values are normalized to the literal 'Unspecified' as a fallback;
keep the not_null test but change the description string on the county field to
explicitly document the 'Unspecified' fallback semantics so downstream consumers
know null/blank values are coerced to that value.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: MyFriendBen/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cd4b541d-7291-43df-b32b-8615ecfc9ca1

📥 Commits

Reviewing files that changed from the base of the PR and between 06fd071 and dc4c0d6.

📒 Files selected for processing (2)
  • dbt/models/postgres/intermediate/int_complete_screener_data.sql
  • dbt/models/postgres/marts/mart_householdmembers.yml

Comment thread dbt/models/postgres/marts/mart_householdmembers.yml Outdated
@SonaliBedge SonaliBedge requested review from andreaHG and catonph May 19, 2026 20:56

@IrynaKolh IrynaKolh left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@RomannaBidnyk RomannaBidnyk left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!
Just a small suggestion for the PR description deployment section - probably we need to run dbt build --select int_complete_screener_data+ (adding "+" in the end) to make sure all related data is rebuilt as well?


- name: county
description: "County of the head of household."
description: "County of the head of household. Null/blank source values are normalized to 'Unspecified'."

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding this update as well to the other mart files such as:

  • mart_qualified_benefits.yml
  • mart_immediate_needs.yml

Or do we not need to test county there?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a county filter on all the dashboards if that informs the answer to this (which I'm thinking means yes?)

…to sonali/mfb-1005-data-coalesce-null-county-at-the-mart-layer-to-eliminate
@SonaliBedge SonaliBedge requested a review from andreaHG June 3, 2026 00:29
@SonaliBedge SonaliBedge merged commit 68e7209 into main Jun 8, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants