-
Notifications
You must be signed in to change notification settings - Fork 47
experiment: Build only archive when bootstrapping from Mithril #827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis PR extends core traits to expose block-produced UTXOs and enable dedicated block decoding, adds archive-based UTXO retrieval, implements archive-aware batch processing for blocks, and introduces a CLI flag for importing blocks directly to archive storage, supplemented by roll module adjustments that reduce error propagation and stub computations. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as Mithril CLI
participant Facade as Facade Layer
participant Batch as Batch Processor
participant Archive as Archive Store
participant Decoder as Block Decoder
CLI->>Facade: import_blocks_to_archive(raw_blocks)
Facade->>Batch: decode blocks into WorkBatch
Batch->>Decoder: decode_block(raw) for each block
Decoder-->>Batch: WorkBlock instances
Batch->>Batch: collect UTXO dependencies
Batch->>Batch: compute produced UTXOs via block.produces()
alt Load Missing UTXOs
Batch->>Archive: get_utxo(txo_ref) for missing
Archive-->>Batch: Arc<EraCbor> or fallback
end
Batch->>Archive: commit delta to archive
Archive-->>Facade: last_slot
Facade-->>CLI: result
Note over Batch,Archive: New archive-based import path<br/>bypasses standard state processing
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes The changes span multiple interrelated modules with new control flow paths (archive-based UTXO loading), non-trivial logic in batch processing and archive integration, and concerning adjustments in the roll module (error suppression, stubbed computations). The heterogeneity across archive traits, batch operations, facade functions, and CLI integration, combined with the behavioral modifications in error handling, necessitates careful cross-module reasoning. Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
crates/cardano/src/owned.rs (1)
30-46: Remove unnecessary intermediate collection.The
.collect::<Vec<...>>()insideflat_mapis redundant—flat_mapnaturally flattens iterators. This creates an unnecessary allocation per transaction.🔎 Suggested fix
fn produces(&self) -> UtxoMap { self.view() .txs() .iter() .flat_map(|tx| { tx.produces() .iter() .map(|(at, output)| { ( TxoRef(tx.hash(), *at as u32), Arc::new(EraCbor(tx.era().into(), output.encode())), ) }) - .collect::<Vec<(TxoRef, Arc<EraCbor>)>>() }) .collect() }crates/core/src/archive.rs (1)
301-311: Consider propagating era conversion errors instead of panicking.Line 306 uses
.try_into().unwrap()which will panic if the stored era value is outside the valid range. While this matches patterns elsewhere in the codebase (e.g.,EraCborconversions), a corrupted archive entry could cause a runtime panic.🔎 Suggested fix
fn get_utxo(&self, txo_ref: &TxoRef) -> Result<Option<EraCbor>, ArchiveError> { let Some(tx) = self.get_tx(txo_ref.0.as_ref())? else { return Ok(None); }; let era = tx.era(); - let decoded = MultiEraTx::decode_for_era(tx.era().try_into().unwrap(), tx.cbor())?; + let pallas_era = tx.era().try_into() + .map_err(|_| ArchiveError::DecodingError( + pallas::codec::minicbor::decode::Error::message("era out of range") + ))?; + let decoded = MultiEraTx::decode_for_era(pallas_era, tx.cbor())?; let Some(output) = decoded.output_at(txo_ref.1 as usize) else { return Ok(None); }; Ok(Some(EraCbor(era, output.encode()))) }crates/cardano/src/roll/mod.rs (1)
186-193: Experimental: Document why visitors are disabled.Most state visitors are commented out, which means account balances, asset states, DRep states, epoch stats, and pool states won't be tracked. This is fine for an archive-only experiment, but should be clearly documented.
Consider adding a comment explaining the intent:
// NOTE: For archive-only import, we skip state-tracking visitors. // Only tx_logs is needed for archive indexing.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
crates/cardano/src/lib.rscrates/cardano/src/owned.rscrates/cardano/src/roll/mod.rscrates/core/src/archive.rscrates/core/src/batch.rscrates/core/src/facade.rscrates/core/src/lib.rssrc/bin/dolos/bootstrap/mithril.rs
🧰 Additional context used
🧬 Code graph analysis (6)
crates/cardano/src/owned.rs (2)
crates/core/src/batch.rs (1)
produces(70-72)crates/core/src/lib.rs (1)
produces(382-382)
crates/cardano/src/roll/mod.rs (3)
crates/cardano/src/roll/epochs.rs (1)
visit_input(220-232)crates/cardano/src/roll/accounts.rs (1)
visit_input(471-492)crates/cardano/src/roll/txs.rs (2)
visit_input(120-140)tx(113-113)
crates/core/src/facade.rs (1)
crates/core/src/batch.rs (5)
raw(62-64)default(19-24)default(107-115)decoded(58-60)new(46-52)
crates/core/src/lib.rs (3)
crates/cardano/src/owned.rs (2)
produces(30-46)raw(56-58)crates/core/src/batch.rs (2)
produces(70-72)raw(62-64)crates/cardano/src/lib.rs (1)
decode_block(379-383)
crates/core/src/batch.rs (2)
crates/cardano/src/owned.rs (1)
produces(30-46)crates/core/src/lib.rs (1)
produces(382-382)
src/bin/dolos/bootstrap/mithril.rs (1)
crates/core/src/facade.rs (2)
import_blocks_to_archive(137-154)import_blocks(116-135)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Check Build
🔇 Additional comments (11)
crates/core/src/lib.rs (2)
382-382: LGTM! Clean trait extension for UTXO production.The
produces()method onBlockaligns with the existingdepends_on()pattern and enables the new archive-based UTXO loading workflow.
548-549: LGTM! Consistent with existing decode pattern.The
decode_blockmethod mirrors the existingdecode_utxomethod, providing a clean abstraction for block decoding within the chain logic.crates/cardano/src/lib.rs (1)
379-383: LGTM! Consistent implementation pattern.The
decode_blockimplementation follows the same pattern asdecode_utxo, delegating to the appropriateOwnedMultiEraBlock::decodemethod.src/bin/dolos/bootstrap/mithril.rs (2)
43-46: LGTM! Clear experimental flag.The
--only-archiveflag is well-documented and provides a clean way to opt into the experimental archive-only bootstrap path.
230-238: LGTM! Clean conditional import path.The branching logic is straightforward and both paths handle errors consistently.
crates/cardano/src/roll/mod.rs (2)
338-352: Experimental stubs for epoch, protocol, and pparams.Hardcoding
epoch = 1,protocol = 1, and usingPParamsSet::default()effectively disables era-aware processing. This will cause incorrect behavior if the archive-only path ever needs to:
- Calculate rewards
- Apply protocol-specific rules
- Handle epoch boundaries correctly
Consider adding a TODO or explicit comment:
// TODO(archive-only): These are stubs for the experimental archive-only path. // Real epoch/protocol/pparams derivation is disabled. let epoch = 1; let protocol = &1; let active_params = PParamsSet::default();
367-372: UTxO delta computation is disabled.The commented-out code at lines 367–372 means
utxo_deltawill never be populated. While the code defensively handles this with anif let Some()check incommit_state, the comment indicates this is intentional tech-debt—the system is migrating to entity-based delta tracking (viaWorkDeltas), which is already populated bybuilder.crawl(). Until the migration is complete, the staleutxo_deltafield should either be removed or actively computed.crates/core/src/facade.rs (2)
40-54: LGTM! Clean archive-only batch processing.The helper correctly sequences archive-based UTXO loading, decoding, delta computation, and archive commit while skipping state-related steps.
137-154: Memory consideration: All blocks held in single batch.Unlike
import_blockswhich drains work incrementally viadrain_pending_work, this function accumulates all decoded blocks into a singleWorkBatchbefore processing. For large imports (e.g., full chain bootstrap), this could consume significant memory.The caller in
mithril.rsalready chunks atchunk_size(default 500), which mitigates this, but the function signature accepts unbounded input.Consider documenting the expected usage pattern or adding a size check:
/// Import blocks directly to archive storage. /// /// NOTE: All blocks are held in memory. Callers should batch appropriately /// (recommended: 500-1000 blocks per call).crates/core/src/batch.rs (2)
70-72: LGTM! Clean delegation to underlying block.The
produces()method follows the same pattern as other delegating methods likeraw()anddepends_on().
194-219: LGTM! Well-structured archive-based UTXO loading.The logic correctly:
- Collects dependencies from all blocks
- Builds a map of UTXOs produced within this batch
- For each dependency, checks archive first, then falls back to batch-produced UTXOs
- Silently skips UTXOs not found in either source (handled by
flat_mapfilteringNone)The silent skip for missing UTXOs is acceptable since
roll/mod.rsnow warns on unresolved inputs.
| match self.utxos.get(&txoref) { | ||
| Some(resolved) => { | ||
| resolved.with_dependent(|_, resolved| { | ||
| visit_all!(self, deltas, visit_input, block, &tx, &input, &resolved); | ||
| Result::<_, ChainError>::Ok(()) | ||
| })?; | ||
| } | ||
| None => { | ||
| warn!(txoref =? txoref, "failed to resolve input"); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Find compute_delta function definition and its calls
rg -n "compute_delta" --type rust -B 2 -A 5Repository: txpipe/dolos
Length of output: 4858
🏁 Script executed:
# Also search for the function containing the code snippet (looking for patterns around utxos.get and warn! for unresolved inputs)
rg -n "failed to resolve input" --type rust -B 10 -A 5Repository: txpipe/dolos
Length of output: 1240
🏁 Script executed:
# Search for "archive" context to understand path differentiation
rg -n "archive" --type rust -i -B 2 -A 2 | head -100Repository: txpipe/dolos
Length of output: 4843
🏁 Script executed:
# Get more context around both compute_delta calls in facade.rs
sed -n '15,35p' crates/core/src/facade.rsRepository: txpipe/dolos
Length of output: 510
🏁 Script executed:
# Also get context around the second call
sed -n '40,55p' crates/core/src/facade.rsRepository: txpipe/dolos
Length of output: 596
🏁 Script executed:
# Check the function names to understand the two paths
rg -n "^pub fn|^fn" crates/core/src/facade.rs -B 1 | head -20Repository: txpipe/dolos
Length of output: 559
🏁 Script executed:
# Read the compute_delta function and surrounding context in roll/mod.rs
sed -n '331,380p' crates/cardano/src/roll/mod.rsRepository: txpipe/dolos
Length of output: 1351
🏁 Script executed:
# Check the function signature and parameters more carefully
sed -n '250,280p' crates/cardano/src/roll/mod.rsRepository: txpipe/dolos
Length of output: 1074
🏁 Script executed:
# Look for any conditional logic or flags in the input processing
rg -n "visit_input|failed to resolve" crates/cardano/src/roll/mod.rs -B 5 -A 2Repository: txpipe/dolos
Length of output: 866
🏁 Script executed:
# Check CardanoConfig structure
rg -n "struct CardanoConfig|pub track:" crates/cardano/src/lib.rs -A 10Repository: txpipe/dolos
Length of output: 38
🏁 Script executed:
# Check if there's any feature flag or conditional behavior
rg -n "archive_only|standard_path|skip.*input" --type rust -iRepository: txpipe/dolos
Length of output: 38
🏁 Script executed:
# Look at what the batch/work context contains
rg -n "WorkBatch|WorkUnit" crates/core/src/lib.rs -B 2 -A 8 | head -50Repository: txpipe/dolos
Length of output: 1568
🏁 Script executed:
# Check what happens in visit_input and if it's critical
rg -n "fn visit_input" crates/cardano/src/roll/ -A 15Repository: txpipe/dolos
Length of output: 4313
🏁 Script executed:
# Look for any comments about unresolved inputs or error handling
rg -n "unresolved|resolve.*input" crates/cardano/src/roll/mod.rs -B 3 -A 3Repository: txpipe/dolos
Length of output: 294
🏁 Script executed:
# Check if there's any indication of whether missing inputs should be an error
rg -n "ChainError|Result" crates/cardano/src/roll/mod.rs -C 2 | head -80Repository: txpipe/dolos
Length of output: 1973
Unresolved input handling applies to both standard and archive paths without differentiation.
When a UTXO input cannot be resolved, the code silently warns and continues without recording any deltas. This skips critical processing for all downstream visitors: account stake credential tracking (accounts.rs:471), address unpacking (txs.rs:120), and UTXO coin delta accounting (epochs.rs:220).
Currently, both execute_batch (standard) and execute_batch_into_archive (archive-only) call the same compute_delta function with identical unresolved-input handling. While incomplete data may be acceptable during archive import, skipping unresolved inputs during standard processing could mask data integrity issues that should be detected.
Either restrict this lenient behavior to archive-only execution, or add a configuration flag to control whether unresolved inputs should produce an error in the standard path.
Summary by CodeRabbit
New Features
--only-archiveCLI flag for direct block import to archive, bypassing standard processing steps.Improvements
✏️ Tip: You can customize this high-level summary in your review settings.