Conversation
…ound-trips - Transaction gains document_type and transaction_type fields - IDuplicateDetector, ITransactionSorting, IIBANGrouping, ITransactionFilter protocols updated to list[Transaction] - DuplicateDetectionStrategy.detect_duplicates() and all create_key() methods accept Transaction objects directly - ServiceRegistry.process_transaction_group() / group_by_iban() operate on list[Transaction]; enrichment helpers mutate tx fields in place - processor.py passes list[Transaction] through pipeline; dicts only at output boundary via transactions_to_dicts() - Unused transactions_to_dicts import removed from pdf_processing_orchestrator - All 10 affected test files updated: fixtures use Transaction.from_dict() and assertions use attribute access (tx.filename etc.)
…utral column names to_dict() defaulted to currency_symbol='€' producing 'Debit €' keys, but column_names uses neutral 'Debit'/'Credit'/'Balance' keys — causing pandas DataFrame to produce empty columns in CSV output.
… regression test - test_transaction_to_dict and test_transactions_to_dicts now assert 'Debit', 'Credit', 'Balance' keys (not 'Debit €' etc.) - New test_transactions_to_dicts_keys_match_default_column_names builds a DataFrame with DEFAULT_COLUMNS names to prove key alignment holds
Code reviewFound 2 issues:
The old code iterated all dict keys and summed any value parseable as a float. 🤖 Generated with Claude Code If this code review was useful, please react with 👍. Otherwise, react with 👎. |
…mount key behaviour Bug 1 (transaction_filter.py): _is_header_transaction was calling tx.to_dict() which includes metadata fields (source_page, confidence_score, extraction_warnings, document_type, transaction_type). These inflated non_empty_fields without contributing to matches, diluting the 50% header-detection threshold. Fixed by inspecting only the five data columns (Date, Details, Debit, Credit, Balance) directly from the Transaction object. Bug 2 (strategies.py): DateAmountDuplicateStrategy.create_key now documents that only monetary fields are summed. The old dict-based code accidentally included source_page and confidence_score in the total — the new code is correct and the change in key values is intentional.
Summary
Closes #71. Migrates the service layer to use
list[Transaction]throughout, eliminating the dict round-trips that previously occurred in every service call. Also fixes a regression introduced during migration where CSV output had empty Debit/Credit/Balance columns.Changes
Transactiongainsdocument_typeandtransaction_typefields;from_dict()/to_dict()updatedIDuplicateDetector,ITransactionSorting,IIBANGrouping,ITransactionFilterprotocols updated tolist[Transaction]DuplicateDetectionStrategy.detect_duplicates()and allcreate_key()methods acceptTransactiondirectlyServiceRegistry.process_transaction_group()/group_by_iban()operate onlist[Transaction]; enrichment helpers mutate tx fields in placeprocessor.pypasseslist[Transaction]through pipeline; dicts only at output boundary viatransactions_to_dicts()transactions_to_dicts()now passescurrency_symbol=""so dict keys ("Debit") match neutralDEFAULT_COLUMNSnames — previously caused empty CSV columnstransactions_to_dictsimport removed frompdf_processing_orchestratorTransaction.from_dict(), assertions use attribute accessType
Testing
make docker-integrationpassed locally (required when touchingDockerfile,entrypoint.sh,docker-compose.yml, orpackages/parser-core/)Checklist