Skip to content

Conversation

@gonzalezzfelipe
Copy link
Contributor

@gonzalezzfelipe gonzalezzfelipe commented Dec 30, 2025

Summary by CodeRabbit

  • New Features

    • Added archive-based UTXO loading capability for block import operations.
    • Introduced --only-archive CLI flag for direct block import to archive, bypassing standard processing steps.
  • Improvements

    • Enhanced error resilience by converting missing input errors to non-fatal warnings.
    • Improved block decoding and UTXO tracking during chain processing.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 30, 2025

📝 Walkthrough

Walkthrough

This PR extends core traits to expose block-produced UTXOs and enable dedicated block decoding, adds archive-based UTXO retrieval, implements archive-aware batch processing for blocks, and introduces a CLI flag for importing blocks directly to archive storage, supplemented by roll module adjustments that reduce error propagation and stub computations.

Changes

Cohort / File(s) Summary
Core trait definitions
crates/core/src/lib.rs
Added produces() method to Block trait returning UtxoMap; added decode_block() method to ChainLogic trait for dedicated block decoding.
Cardano implementation
crates/cardano/src/lib.rs, crates/cardano/src/owned.rs
Implemented decode_block() in CardanoLogic delegating to OwnedMultiEraBlock::decode; implemented produces() in OwnedMultiEraBlock that iterates transaction outputs and collects them into a UtxoMap.
Archive integration
crates/core/src/archive.rs
Added get_utxo() method to ArchiveStore trait with default implementation that decodes transactions and retrieves outputs by TxoRef with era-aware handling.
Batch processing
crates/core/src/batch.rs
Added produces() method to WorkBlock; added load_utxos_using_archive() method to WorkBatch that loads missing UTXOs from archive via get_utxo() calls, supplementing state-based loading.
Archive import facade
crates/core/src/facade.rs
Introduced private execute_batch_into_archive() helper and public import_blocks_to_archive() function to import blocks directly to archive, decoding blocks and loading UTXOs via archive.
Roll module adjustments
crates/cardano/src/roll/mod.rs
Replaced hard failure on missing input UTXOs with warn-level warnings; commented-out state visitations for account/asset/drep/epoch/pool and proposal_logs; stubbed epoch/protocol derivation; disabled UTxO delta application logic.
CLI enhancement
src/bin/dolos/bootstrap/mithril.rs
Added only_archive: bool flag to Mithril bootstrap Args; when enabled, switches batch import to use import_blocks_to_archive instead of standard import_blocks.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as Mithril CLI
    participant Facade as Facade Layer
    participant Batch as Batch Processor
    participant Archive as Archive Store
    participant Decoder as Block Decoder

    CLI->>Facade: import_blocks_to_archive(raw_blocks)
    
    Facade->>Batch: decode blocks into WorkBatch
    Batch->>Decoder: decode_block(raw) for each block
    Decoder-->>Batch: WorkBlock instances
    
    Batch->>Batch: collect UTXO dependencies
    Batch->>Batch: compute produced UTXOs via block.produces()
    
    alt Load Missing UTXOs
        Batch->>Archive: get_utxo(txo_ref) for missing
        Archive-->>Batch: Arc<EraCbor> or fallback
    end
    
    Batch->>Archive: commit delta to archive
    Archive-->>Facade: last_slot
    Facade-->>CLI: result

    Note over Batch,Archive: New archive-based import path<br/>bypasses standard state processing
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

The changes span multiple interrelated modules with new control flow paths (archive-based UTXO loading), non-trivial logic in batch processing and archive integration, and concerning adjustments in the roll module (error suppression, stubbed computations). The heterogeneity across archive traits, batch operations, facade functions, and CLI integration, combined with the behavioral modifications in error handling, necessitates careful cross-module reasoning.

Possibly related PRs

Suggested reviewers

  • scarmuega

Poem

🐰 A rabbit hops through archives so grand,
Blocks decoded and UTXOs planned,
No state delays, just storage so fast,
Direct to the archive—a import at last! 🏃‍♂️✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding experimental functionality to build only archive when bootstrapping from Mithril, which aligns with the --only-archive flag and import_blocks_to_archive pathway introduced in the changeset.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
crates/cardano/src/owned.rs (1)

30-46: Remove unnecessary intermediate collection.

The .collect::<Vec<...>>() inside flat_map is redundant—flat_map naturally flattens iterators. This creates an unnecessary allocation per transaction.

🔎 Suggested fix
     fn produces(&self) -> UtxoMap {
         self.view()
             .txs()
             .iter()
             .flat_map(|tx| {
                 tx.produces()
                     .iter()
                     .map(|(at, output)| {
                         (
                             TxoRef(tx.hash(), *at as u32),
                             Arc::new(EraCbor(tx.era().into(), output.encode())),
                         )
                     })
-                    .collect::<Vec<(TxoRef, Arc<EraCbor>)>>()
             })
             .collect()
     }
crates/core/src/archive.rs (1)

301-311: Consider propagating era conversion errors instead of panicking.

Line 306 uses .try_into().unwrap() which will panic if the stored era value is outside the valid range. While this matches patterns elsewhere in the codebase (e.g., EraCbor conversions), a corrupted archive entry could cause a runtime panic.

🔎 Suggested fix
     fn get_utxo(&self, txo_ref: &TxoRef) -> Result<Option<EraCbor>, ArchiveError> {
         let Some(tx) = self.get_tx(txo_ref.0.as_ref())? else {
             return Ok(None);
         };
         let era = tx.era();
-        let decoded = MultiEraTx::decode_for_era(tx.era().try_into().unwrap(), tx.cbor())?;
+        let pallas_era = tx.era().try_into()
+            .map_err(|_| ArchiveError::DecodingError(
+                pallas::codec::minicbor::decode::Error::message("era out of range")
+            ))?;
+        let decoded = MultiEraTx::decode_for_era(pallas_era, tx.cbor())?;
         let Some(output) = decoded.output_at(txo_ref.1 as usize) else {
             return Ok(None);
         };
         Ok(Some(EraCbor(era, output.encode())))
     }
crates/cardano/src/roll/mod.rs (1)

186-193: Experimental: Document why visitors are disabled.

Most state visitors are commented out, which means account balances, asset states, DRep states, epoch stats, and pool states won't be tracked. This is fine for an archive-only experiment, but should be clearly documented.

Consider adding a comment explaining the intent:

// NOTE: For archive-only import, we skip state-tracking visitors.
// Only tx_logs is needed for archive indexing.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb64e93 and a19560a.

📒 Files selected for processing (8)
  • crates/cardano/src/lib.rs
  • crates/cardano/src/owned.rs
  • crates/cardano/src/roll/mod.rs
  • crates/core/src/archive.rs
  • crates/core/src/batch.rs
  • crates/core/src/facade.rs
  • crates/core/src/lib.rs
  • src/bin/dolos/bootstrap/mithril.rs
🧰 Additional context used
🧬 Code graph analysis (6)
crates/cardano/src/owned.rs (2)
crates/core/src/batch.rs (1)
  • produces (70-72)
crates/core/src/lib.rs (1)
  • produces (382-382)
crates/cardano/src/roll/mod.rs (3)
crates/cardano/src/roll/epochs.rs (1)
  • visit_input (220-232)
crates/cardano/src/roll/accounts.rs (1)
  • visit_input (471-492)
crates/cardano/src/roll/txs.rs (2)
  • visit_input (120-140)
  • tx (113-113)
crates/core/src/facade.rs (1)
crates/core/src/batch.rs (5)
  • raw (62-64)
  • default (19-24)
  • default (107-115)
  • decoded (58-60)
  • new (46-52)
crates/core/src/lib.rs (3)
crates/cardano/src/owned.rs (2)
  • produces (30-46)
  • raw (56-58)
crates/core/src/batch.rs (2)
  • produces (70-72)
  • raw (62-64)
crates/cardano/src/lib.rs (1)
  • decode_block (379-383)
crates/core/src/batch.rs (2)
crates/cardano/src/owned.rs (1)
  • produces (30-46)
crates/core/src/lib.rs (1)
  • produces (382-382)
src/bin/dolos/bootstrap/mithril.rs (1)
crates/core/src/facade.rs (2)
  • import_blocks_to_archive (137-154)
  • import_blocks (116-135)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Check Build
🔇 Additional comments (11)
crates/core/src/lib.rs (2)

382-382: LGTM! Clean trait extension for UTXO production.

The produces() method on Block aligns with the existing depends_on() pattern and enables the new archive-based UTXO loading workflow.


548-549: LGTM! Consistent with existing decode pattern.

The decode_block method mirrors the existing decode_utxo method, providing a clean abstraction for block decoding within the chain logic.

crates/cardano/src/lib.rs (1)

379-383: LGTM! Consistent implementation pattern.

The decode_block implementation follows the same pattern as decode_utxo, delegating to the appropriate OwnedMultiEraBlock::decode method.

src/bin/dolos/bootstrap/mithril.rs (2)

43-46: LGTM! Clear experimental flag.

The --only-archive flag is well-documented and provides a clean way to opt into the experimental archive-only bootstrap path.


230-238: LGTM! Clean conditional import path.

The branching logic is straightforward and both paths handle errors consistently.

crates/cardano/src/roll/mod.rs (2)

338-352: Experimental stubs for epoch, protocol, and pparams.

Hardcoding epoch = 1, protocol = 1, and using PParamsSet::default() effectively disables era-aware processing. This will cause incorrect behavior if the archive-only path ever needs to:

  • Calculate rewards
  • Apply protocol-specific rules
  • Handle epoch boundaries correctly

Consider adding a TODO or explicit comment:

// TODO(archive-only): These are stubs for the experimental archive-only path.
// Real epoch/protocol/pparams derivation is disabled.
let epoch = 1;
let protocol = &1;
let active_params = PParamsSet::default();

367-372: UTxO delta computation is disabled.

The commented-out code at lines 367–372 means utxo_delta will never be populated. While the code defensively handles this with an if let Some() check in commit_state, the comment indicates this is intentional tech-debt—the system is migrating to entity-based delta tracking (via WorkDeltas), which is already populated by builder.crawl(). Until the migration is complete, the stale utxo_delta field should either be removed or actively computed.

crates/core/src/facade.rs (2)

40-54: LGTM! Clean archive-only batch processing.

The helper correctly sequences archive-based UTXO loading, decoding, delta computation, and archive commit while skipping state-related steps.


137-154: Memory consideration: All blocks held in single batch.

Unlike import_blocks which drains work incrementally via drain_pending_work, this function accumulates all decoded blocks into a single WorkBatch before processing. For large imports (e.g., full chain bootstrap), this could consume significant memory.

The caller in mithril.rs already chunks at chunk_size (default 500), which mitigates this, but the function signature accepts unbounded input.

Consider documenting the expected usage pattern or adding a size check:

/// Import blocks directly to archive storage.
/// 
/// NOTE: All blocks are held in memory. Callers should batch appropriately
/// (recommended: 500-1000 blocks per call).
crates/core/src/batch.rs (2)

70-72: LGTM! Clean delegation to underlying block.

The produces() method follows the same pattern as other delegating methods like raw() and depends_on().


194-219: LGTM! Well-structured archive-based UTXO loading.

The logic correctly:

  1. Collects dependencies from all blocks
  2. Builds a map of UTXOs produced within this batch
  3. For each dependency, checks archive first, then falls back to batch-produced UTXOs
  4. Silently skips UTXOs not found in either source (handled by flat_map filtering None)

The silent skip for missing UTXOs is acceptable since roll/mod.rs now warns on unresolved inputs.

Comment on lines +264 to +274
match self.utxos.get(&txoref) {
Some(resolved) => {
resolved.with_dependent(|_, resolved| {
visit_all!(self, deltas, visit_input, block, &tx, &input, &resolved);
Result::<_, ChainError>::Ok(())
})?;
}
None => {
warn!(txoref =? txoref, "failed to resolve input");
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find compute_delta function definition and its calls
rg -n "compute_delta" --type rust -B 2 -A 5

Repository: txpipe/dolos

Length of output: 4858


🏁 Script executed:

# Also search for the function containing the code snippet (looking for patterns around utxos.get and warn! for unresolved inputs)
rg -n "failed to resolve input" --type rust -B 10 -A 5

Repository: txpipe/dolos

Length of output: 1240


🏁 Script executed:

# Search for "archive" context to understand path differentiation
rg -n "archive" --type rust -i -B 2 -A 2 | head -100

Repository: txpipe/dolos

Length of output: 4843


🏁 Script executed:

# Get more context around both compute_delta calls in facade.rs
sed -n '15,35p' crates/core/src/facade.rs

Repository: txpipe/dolos

Length of output: 510


🏁 Script executed:

# Also get context around the second call
sed -n '40,55p' crates/core/src/facade.rs

Repository: txpipe/dolos

Length of output: 596


🏁 Script executed:

# Check the function names to understand the two paths
rg -n "^pub fn|^fn" crates/core/src/facade.rs -B 1 | head -20

Repository: txpipe/dolos

Length of output: 559


🏁 Script executed:

# Read the compute_delta function and surrounding context in roll/mod.rs
sed -n '331,380p' crates/cardano/src/roll/mod.rs

Repository: txpipe/dolos

Length of output: 1351


🏁 Script executed:

# Check the function signature and parameters more carefully
sed -n '250,280p' crates/cardano/src/roll/mod.rs

Repository: txpipe/dolos

Length of output: 1074


🏁 Script executed:

# Look for any conditional logic or flags in the input processing
rg -n "visit_input|failed to resolve" crates/cardano/src/roll/mod.rs -B 5 -A 2

Repository: txpipe/dolos

Length of output: 866


🏁 Script executed:

# Check CardanoConfig structure
rg -n "struct CardanoConfig|pub track:" crates/cardano/src/lib.rs -A 10

Repository: txpipe/dolos

Length of output: 38


🏁 Script executed:

# Check if there's any feature flag or conditional behavior
rg -n "archive_only|standard_path|skip.*input" --type rust -i

Repository: txpipe/dolos

Length of output: 38


🏁 Script executed:

# Look at what the batch/work context contains
rg -n "WorkBatch|WorkUnit" crates/core/src/lib.rs -B 2 -A 8 | head -50

Repository: txpipe/dolos

Length of output: 1568


🏁 Script executed:

# Check what happens in visit_input and if it's critical
rg -n "fn visit_input" crates/cardano/src/roll/ -A 15

Repository: txpipe/dolos

Length of output: 4313


🏁 Script executed:

# Look for any comments about unresolved inputs or error handling
rg -n "unresolved|resolve.*input" crates/cardano/src/roll/mod.rs -B 3 -A 3

Repository: txpipe/dolos

Length of output: 294


🏁 Script executed:

# Check if there's any indication of whether missing inputs should be an error
rg -n "ChainError|Result" crates/cardano/src/roll/mod.rs -C 2 | head -80

Repository: txpipe/dolos

Length of output: 1973


Unresolved input handling applies to both standard and archive paths without differentiation.

When a UTXO input cannot be resolved, the code silently warns and continues without recording any deltas. This skips critical processing for all downstream visitors: account stake credential tracking (accounts.rs:471), address unpacking (txs.rs:120), and UTXO coin delta accounting (epochs.rs:220).

Currently, both execute_batch (standard) and execute_batch_into_archive (archive-only) call the same compute_delta function with identical unresolved-input handling. While incomplete data may be acceptable during archive import, skipping unresolved inputs during standard processing could mask data integrity issues that should be detected.

Either restrict this lenient behavior to archive-only execution, or add a configuration flag to control whether unresolved inputs should produce an error in the standard path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants