fix(dbt): coalesce NULL county to 'Unspecified' at intermediate layer#93
Conversation
|
Note
|
| Layer / File(s) | Summary |
|---|---|
County field source normalization dbt/models/postgres/intermediate/int_complete_screener_data.sql |
The base_table_1 CTE normalizes county by trimming whitespace and converting empty strings to 'Unspecified' using COALESCE(NULLIF(TRIM(ss.county), ''), 'Unspecified'). |
SQL expression formatting in intermediate model dbt/models/postgres/intermediate/int_complete_screener_data.sql |
Non-functional formatting: uppercase SQL function names (TRIM, COALESCE, TO_CHAR, SUM) in partner and benefit aggregation expressions; logic unchanged. |
Marts schema: add/validate county dbt/models/postgres/marts/mart_householdmembers.yml, dbt/models/postgres/marts/mart_immediate_needs.yml, dbt/models/postgres/marts/mart_qualified_benefits.yml |
Added or updated county column definitions with descriptions noting normalization to 'Unspecified' and added tests: [not_null] for each mart where applicable. |
Estimated code review effort
🎯 2 (Simple) | ⏱️ ~8 minutes
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | The PR title accurately reflects the main change: coalescing NULL county values to 'Unspecified' at the intermediate dbt layer to resolve the issue. |
| Description check | ✅ Passed | The PR description comprehensively covers all required template sections: context with issue link, specific changes made, testing steps, deployment instructions, and reviewer notes. |
| Docstring Coverage | ✅ Passed | No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check. |
| Linked Issues check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
| Out of Scope Changes check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
✨ Finishing Touches
🧪 Generate unit tests (beta)
- Create PR with unit tests
- Commit unit tests in branch
sonali/mfb-1005-data-coalesce-null-county-at-the-mart-layer-to-eliminate
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@dbt/models/postgres/marts/mart_householdmembers.yml`:
- Around line 29-31: Update the description for the county field in
mart_householdmembers (the county column metadata) to indicate that null or
blank source values are normalized to the literal 'Unspecified' as a fallback;
keep the not_null test but change the description string on the county field to
explicitly document the 'Unspecified' fallback semantics so downstream consumers
know null/blank values are coerced to that value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: MyFriendBen/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cd4b541d-7291-43df-b32b-8615ecfc9ca1
📒 Files selected for processing (2)
dbt/models/postgres/intermediate/int_complete_screener_data.sqldbt/models/postgres/marts/mart_householdmembers.yml
RomannaBidnyk
left a comment
There was a problem hiding this comment.
Looks great to me!
Just a small suggestion for the PR description deployment section - probably we need to run dbt build --select int_complete_screener_data+ (adding "+" in the end) to make sure all related data is rebuilt as well?
|
|
||
| - name: county | ||
| description: "County of the head of household." | ||
| description: "County of the head of household. Null/blank source values are normalized to 'Unspecified'." |
There was a problem hiding this comment.
Is it worth adding this update as well to the other mart files such as:
- mart_qualified_benefits.yml
- mart_immediate_needs.yml
Or do we not need to test county there?
There was a problem hiding this comment.
We have a county filter on all the dashboards if that informs the answer to this (which I'm thinking means yes?)
… for immediate need and qualified benefits
…to sonali/mfb-1005-data-coalesce-null-county-at-the-mart-layer-to-eliminate
Context & Motivation
countyis nullable inscreener_screenand that nullability propagated through every downstream dbt mart. Dashboard SQL had to work around it withIS NOT DISTINCT FROM, which is harder to read and prevents hash-based join optimization in Postgres. The affected white labels areco_tax_calculator(100% NULL — never collects county) andco(~0.2% NULL from pre-2024-10-14 data and residual FE validation gaps).Changes Made
COALESCE(NULLIF(TRIM(ss.county), ''), 'Unspecified')inint_complete_screener_data— the single upstream source all five affected marts derive county from (mart_screener_data, mart_immediate_needs,mart_qualified_benefits,mart_previous_benefits,mart_householdmembers). No mart-level changes needed.not_nulldbt test onmart_householdmembers.countyand updated its description to document the 'Unspecified' placeholder.Testing
dbt run --select int_complete_screener_dataVerify NULL county counts drop to 0 after run:
-- expect: 0
Deployment
dbt build --select int_complete_screener_data+to backfill all downstream marts immediately; otherwise NULL rows persist until the next nightly cronNotes for Reviewers
The placeholder is
Unspecified(consistent with the existingcountyfallback already present inint_complete_screener_datafor empty strings). If the team prefersUnknownas the ticket suggested, that's a one-word change.No dashboard SQL changes are required as a result of this PR, but any existing
IS NOT DISTINCT FROM NULLcounty filters in Metabase can be simplified to= 'Unspecified 'after deploymentKnown limitations: historical rows in
co_tax_calculatorwill showUnspecifiedpermanently - that white label never collects county by design.Future considerations:
Summary by CodeRabbit