Skip to content

feat(etl): add 5 new data pipelines for judiciary, ownership, and legislative data#64

Open
lspassos1 wants to merge 1 commit intoWorld-Open-Graph:mainfrom
lspassos1:feat/new-etl-pipelines-v2
Open

feat(etl): add 5 new data pipelines for judiciary, ownership, and legislative data#64
lspassos1 wants to merge 1 commit intoWorld-Open-Graph:mainfrom
lspassos1:feat/new-etl-pipelines-v2

Conversation

@lspassos1
Copy link
Contributor

Summary

5 new ETL pipelines targeting high-value unbuilt sources from the ingestion priority matrix (docs/data-sources.md).

Scope (ETL only)

Action File
NEW etl/src/bracc_etl/pipelines/bcb_liquidacao.py
NEW etl/src/bracc_etl/pipelines/stj_dados_abertos.py
NEW etl/src/bracc_etl/pipelines/cvm_full_ownership.py
NEW etl/src/bracc_etl/pipelines/camara_votes_bills.py
NEW etl/src/bracc_etl/pipelines/senado_votes_bills.py
MODIFY etl/src/bracc_etl/runner.py (5 imports + registrations)

New Pipelines

Pipeline Source Nodes Relationships
bcb_liquidacao BCB liquidated institutions BankLiquidation REGIME_ESPECIAL
stj_dados_abertos STJ Superior Court decisions LegalCase RELATOR_DE
cvm_full_ownership_chain CVM shareholder chains CvmParticipation DETEM_PARTICIPACAO
camara_votes_bills Chamber votes and bills Bill, Vote VOTOU
senado_votes_bills Senate votes and bills Bill, SenateVote VOTOU

All follow the Pipeline base class: CSV extract → itertuples() transform → Neo4jBatchLoader load.

Change type

  • release:data

Breaking change?

  • No

Validation

  • Local tests/checks passed for impacted scope
  • CI and Security checks are green
  • Exactly one release label is set on this PR

Public safety and compliance checklist

  • No personal identifier exposure was introduced
  • PUBLIC_MODE behavior was reviewed (if relevant)
  • Public boundary gate is green
  • Public endpoints and demo data contain no personal data fields
  • Legal/policy docs were reviewed for scope-impacting changes
  • Snapshot boundary remains compliant with docs/release/public_boundary_matrix.csv

Risk and rollback

Low risk. Adds new disconnected pipeline modules without modifying any existing code. Rollback by reverting.

…islative data

New pipelines:
- bcb_liquidacao: Liquidated financial institutions (BankLiquidation nodes)
- stj_dados_abertos: STJ Superior Court decisions (LegalCase nodes)
- cvm_full_ownership_chain: CVM shareholder ownership chains (DETEM_PARTICIPACAO)
- camara_votes_bills: Chamber of Deputies votes and bills (Bill, Vote nodes)
- senado_votes_bills: Senate votes and bills (Bill, SenateVote nodes)

All follow the Pipeline base class pattern:
- Extract from CSV with pandas
- Transform using itertuples() for performance
- Load via Neo4jBatchLoader with UNWIND batching
- Deterministic IDs via hashlib.sha256
@brunoclz
Copy link
Collaborator

brunoclz commented Mar 8, 2026

Maintainer triage on March 8, 2026: refused for merge in this cycle and kept open.

Blockers:

  • The PR is about 915 additions, outside the conservative churn envelope for this cycle.
  • It is a multi-pipeline ETL feature and needs a deeper manual review or a narrower split.
  • No release label is currently applied on the PR.

Required next step: either split this into smaller reviewable slices or resubmit after a deeper manual review pass is explicitly scheduled.

@brunoclz brunoclz added status:denied-cycle PR denied in current governor cycle needs-author-action Author action required labels Mar 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-author-action Author action required status:denied-cycle PR denied in current governor cycle

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants