feat(etl): add 5 new data pipelines for judiciary, ownership, and legislative data#64
Open
lspassos1 wants to merge 1 commit intoWorld-Open-Graph:mainfrom
Open
feat(etl): add 5 new data pipelines for judiciary, ownership, and legislative data#64lspassos1 wants to merge 1 commit intoWorld-Open-Graph:mainfrom
lspassos1 wants to merge 1 commit intoWorld-Open-Graph:mainfrom
Conversation
…islative data New pipelines: - bcb_liquidacao: Liquidated financial institutions (BankLiquidation nodes) - stj_dados_abertos: STJ Superior Court decisions (LegalCase nodes) - cvm_full_ownership_chain: CVM shareholder ownership chains (DETEM_PARTICIPACAO) - camara_votes_bills: Chamber of Deputies votes and bills (Bill, Vote nodes) - senado_votes_bills: Senate votes and bills (Bill, SenateVote nodes) All follow the Pipeline base class pattern: - Extract from CSV with pandas - Transform using itertuples() for performance - Load via Neo4jBatchLoader with UNWIND batching - Deterministic IDs via hashlib.sha256
Collaborator
|
Maintainer triage on March 8, 2026: refused for merge in this cycle and kept open. Blockers:
Required next step: either split this into smaller reviewable slices or resubmit after a deeper manual review pass is explicitly scheduled. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
5 new ETL pipelines targeting high-value unbuilt sources from the ingestion priority matrix (
docs/data-sources.md).Scope (ETL only)
etl/src/bracc_etl/pipelines/bcb_liquidacao.pyetl/src/bracc_etl/pipelines/stj_dados_abertos.pyetl/src/bracc_etl/pipelines/cvm_full_ownership.pyetl/src/bracc_etl/pipelines/camara_votes_bills.pyetl/src/bracc_etl/pipelines/senado_votes_bills.pyetl/src/bracc_etl/runner.py(5 imports + registrations)New Pipelines
bcb_liquidacaoBankLiquidationREGIME_ESPECIALstj_dados_abertosLegalCaseRELATOR_DEcvm_full_ownership_chainCvmParticipationDETEM_PARTICIPACAOcamara_votes_billsBill,VoteVOTOUsenado_votes_billsBill,SenateVoteVOTOUAll follow the
Pipelinebase class: CSV extract →itertuples()transform →Neo4jBatchLoaderload.Change type
release:dataBreaking change?
Validation
Public safety and compliance checklist
PUBLIC_MODEbehavior was reviewed (if relevant)docs/release/public_boundary_matrix.csvRisk and rollback
Low risk. Adds new disconnected pipeline modules without modifying any existing code. Rollback by reverting.