-
Notifications
You must be signed in to change notification settings - Fork 0
feat(storage): implement MDBX storage backend #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
feat: add EVM configuration and receipts modules
…nto feat/el-database # Conflicts: # crates/execution/Cargo.toml # crates/execution/src/lib.rs # crates/execution/src/types.rs
feat: add database and state management modules
c9d0aa9 to
e26b860
Compare
feat: migrate to revm 33.1.0
- Added MDBX-backed implementation for DCL storage operations. - Introduced `Database`, `MdbxDclStore`, and `MdbxWal` for persistent storage. - Created custom table definitions for DCL and consensus data. - Implemented serialization and deserialization for WAL entries. - Added feature flag for enabling MDBX support. - Updated documentation to reflect new usage and configuration options.
… error handling improvements
02d0fb2 to
bbf5722
Compare
…ons and add .serena to .gitignore
…nts in DclStore implementation
…WAL implementation with full persistence and recovery features
…ction and enhance transaction support
3241594 to
8c47368
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work on the MDBX storage layer! Using reth-db makes sense here.
Needs fixing
1. Transaction atomicity
In put_car, you're writing to both Cars and CarsByHash - if the second write fails, rollback isn't handled cleanly. Maybe add a small atomic_write helper? Would be reusable elsewhere too.
2. Dead code attributes
The #[allow(dead_code)] on those conversion functions can go - they're actually being used in the DclStore impl below.
3. Magic numbers
Would be nice to pull out 96 (BLS sig size) and similar into named constants.
Future stuff (not blocking)
- LRU cache for
get_highest_car_positionif it gets called a lot - Might want to look at
borshif bincode versioning becomes an issue has_batchcould skip deserialization and just check key existence
| .map_err(|e| StorageError::Database(format!("Failed to check batch: {e}")))? | ||
| .is_some(); | ||
|
|
||
| if existed { | ||
| tx.delete::<Batches>(key, None) | ||
| .map_err(|e| StorageError::Database(format!("Failed to delete batch: {e}")))?; | ||
| tx.commit() | ||
| .map_err(|e| StorageError::Database(format!("Failed to commit delete: {e}")))?; | ||
| debug!(?hash, "Batch deleted"); | ||
| } | ||
|
|
||
| Ok(existed) | ||
| } | ||
|
|
||
| // ============================================================ | ||
| // Car Operations | ||
| // ============================================================ | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transaction Atomicity Concern
The put_car method writes to two tables (Cars and CarsByHash) within the same transaction, which is good. However, if the second put fails after the first succeeds, we need to ensure the transaction is properly rolled back.
Currently, MDBX transactions will rollback on drop if not committed, but we should make this explicit for clarity and add a helper for multi-table writes.
Suggestion:
async fn put_car(&self, car: Car) -> Result<()> {
use super::tables::{BincodeValue, Cars, CarsByHash};
use reth_db_api::transaction::DbTxMut;
let hash = car.hash();
let key = CarTableKey::new(car.proposer.as_bytes(), car.position);
let stored = Self::car_to_stored(&car);
let hash_key = HashKey::from_slice(hash.as_bytes());
trace!(proposer = ?car.proposer, position = car.position, "Storing car");
let tx = self.db.tx_mut()?;
// Use a closure to ensure atomicity - if any operation fails,
// the transaction will be rolled back when dropped
let result = (|| -> Result<()> {
tx.put::<Cars>(key, BincodeValue(stored))
.map_err(|e| StorageError::Database(format!("Failed to put car: {e}")))?;
tx.put::<CarsByHash>(hash_key, key)
.map_err(|e| StorageError::Database(format!("Failed to put car index: {e}")))?;
Ok(())
})();
match result {
Ok(()) => {
tx.commit()
.map_err(|e| StorageError::Database(format!("Failed to commit car: {e}")))?;
debug!(?hash, "Car stored");
Ok(())
}
Err(e) => {
// Transaction will be rolled back on drop
tracing::warn!(?hash, error = %e, "Failed to store car, rolling back");
Err(e)
}
}
}Alternatively, consider adding a helper method for atomic multi-table operations:
impl MdbxDclStore {
/// Execute multiple operations atomically
fn atomic_write<F, T>(&self, f: F) -> Result<T>
where
F: FnOnce(&<Database as reth_db_api::Database>::TXMut) -> Result<T>,
{
let tx = self.db.tx_mut()?;
let result = f(&tx)?;
tx.commit()
.map_err(|e| StorageError::Database(format!("Commit failed: {e}")))?;
Ok(result)
}
}| .collect(); | ||
|
|
||
| let parent_ref = stored.parent_ref.map(Hash::from_bytes); | ||
|
|
||
| let sig_bytes: [u8; 96] = stored | ||
| .signature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic Numbers for Cryptographic Sizes
The BLS signature size (96 bytes) appears as a magic number. This makes the code harder to maintain and could lead to inconsistencies if the value needs to change.
Suggestion: Define constants at the module or crate level:
/// Size of a BLS signature in bytes (G2 point in compressed form)
pub const BLS_SIGNATURE_SIZE: usize = 96;
/// Size of a BLS aggregate signature in bytes
pub const BLS_AGGREGATE_SIGNATURE_SIZE: usize = 96;
/// Size of a ValidatorId in bytes
pub const VALIDATOR_ID_SIZE: usize = 20;Then use them throughout:
let sig_bytes: [u8; BLS_SIGNATURE_SIZE] = stored
.signature
.as_slice()
.try_into()
.map_err(|_| StorageError::Database(
format!("Invalid BLS signature length: expected {BLS_SIGNATURE_SIZE} bytes")
))?;This also improves error messages by showing the expected size.
| // ============================================================ | ||
|
|
||
| #[allow(dead_code)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove #[allow(dead_code)] Attributes
These conversion helpers are actively used by the DclStore implementation below. The #[allow(dead_code)] attributes suggest they might have been added during development when the trait methods weren't implemented yet.
Now that the implementation is complete, please remove these attributes. If certain helpers truly aren't used, consider removing them entirely to reduce maintenance burden.
Suggestion: Remove the #[allow(dead_code)] attributes from all conversion helpers:
fn batch_to_stored(batch: &Batch) -> StoredBatch {
// ...
}
fn stored_to_batch(stored: StoredBatch, _hash: Hash) -> Batch {
// ...
}
fn car_to_stored(car: &Car) -> StoredCar {
// ...
}
// ... etcIf the compiler then reports unused functions, either:
- Remove them if truly unused
- Add
#[cfg(test)]if only used in tests - Document why they exist if they're for future use
|
|
||
| let tx = self.db.tx()?; | ||
| let result = tx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimize has_batch with Direct Key Check
The current implementation fetches and deserializes the entire batch just to check existence. For large batches with many transactions, this is wasteful.
Suggestion: Use a more efficient existence check:
async fn has_batch(&self, hash: &Hash) -> Result<bool> {
use super::tables::Batches;
use reth_db_api::transaction::DbTx;
use reth_db_api::cursor::DbCursorRO;
let key = HashKey::from_slice(hash.as_bytes());
let tx = self.db.tx()?;
// Use cursor seek which doesn't deserialize the value
let mut cursor = tx.cursor_read::<Batches>()
.map_err(|e| StorageError::Database(format!("Failed to create cursor: {e}")))?;
match cursor.seek_exact(key) {
Ok(Some(_)) => Ok(true),
Ok(None) => Ok(false),
Err(e) => Err(StorageError::Database(format!("Failed to check batch: {e}"))),
}
}This avoids deserializing the batch data when we only need to know if the key exists.
|
|
||
| async fn get_car_by_hash(&self, hash: &Hash) -> Result<Option<Car>> { | ||
| use super::tables::{Cars, CarsByHash}; | ||
| use reth_db_api::transaction::DbTx; | ||
|
|
||
| let hash_key = HashKey::from_slice(hash.as_bytes()); | ||
|
|
||
| trace!(?hash, "Getting car by hash"); | ||
|
|
||
| let tx = self.db.tx()?; | ||
|
|
||
| // Look up the car key in the secondary index | ||
| let car_key = tx | ||
| .get::<CarsByHash>(hash_key) | ||
| .map_err(|e| StorageError::Database(format!("Failed to get car index: {e}")))?; | ||
|
|
||
| match car_key { | ||
| Some(key) => { | ||
| // Fetch the actual car | ||
| let result = tx | ||
| .get::<Cars>(key) | ||
| .map_err(|e| StorageError::Database(format!("Failed to get car: {e}")))?; | ||
|
|
||
| match result { | ||
| Some(bincode_value) => { | ||
| let car = Self::stored_to_car(bincode_value.0)?; | ||
| Ok(Some(car)) | ||
| } | ||
| None => Ok(None), | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider Caching Highest Positions
The cursor-based approach for finding the highest car position works correctly, but for frequently accessed validators, this could become a bottleneck.
Suggestion for future optimization:
Consider maintaining an in-memory cache for highest positions:
pub struct MdbxDclStore {
db: Arc<Database>,
/// Cache of highest car position per validator (LRU with reasonable limit)
highest_positions: RwLock<lru::LruCache<ValidatorId, u64>>,
}
impl MdbxDclStore {
pub fn new(db: Arc<Database>) -> Self {
Self {
db,
highest_positions: RwLock::new(
lru::LruCache::new(std::num::NonZeroUsize::new(1024).unwrap())
),
}
}
async fn get_highest_car_position(&self, validator: &ValidatorId) -> Result<Option<u64>> {
// Check cache first
if let Some(&pos) = self.highest_positions.read().peek(validator) {
return Ok(Some(pos));
}
// Fall back to cursor scan
let pos = self.get_highest_car_position_from_db(validator).await?;
if let Some(p) = pos {
self.highest_positions.write().put(*validator, p);
}
Ok(pos)
}
}Remember to update the cache in put_car:
async fn put_car(&self, car: Car) -> Result<()> {
// ... existing code ...
// Update cache
self.highest_positions.write().put(car.proposer, car.position);
Ok(())
}This is optional for now but worth considering for production workloads.
| validator_prefix[..copy_len].copy_from_slice(&validator_bytes[..copy_len]); | ||
| Self { | ||
| validator_prefix, | ||
| position, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider Using borsh for Deterministic Serialization
While bincode is fast and compact, its encoding can vary between versions, which could cause issues when:
- Upgrading the
bincodecrate - Computing hashes over serialized data
- Sharing data between nodes running different versions
For blockchain storage where determinism is critical, consider using borsh (Binary Object Representation Serializer for Hashing):
use borsh::{BorshSerialize, BorshDeserialize};
#[derive(Clone, Debug, BorshSerialize, BorshDeserialize)]
pub struct StoredBatch {
pub worker_id: u32,
pub transactions: Vec<Vec<u8>>,
pub timestamp: u64,
}Benefits:
- Deterministic: Same input always produces same bytes
- Schema evolution: Better forward compatibility
- Widely used: Standard in Solana, NEAR ecosystems
If you prefer to keep bincode, consider pinning to a specific version and documenting this decision in the module docs.
Resolve conflict in deny.toml by using el-integration comments
- Introduced `MdbxEvmStore` for EVM state storage, supporting account, code, storage, and block hash operations. - Added `MdbxStakingStore` for staking state management, including validator information and total stake. - Created necessary tables and key structures for EVM and staking data in MDBX. - Updated `lib.rs` to include new modules and re-export relevant types. - Implemented tests for EVM and staking storage functionalities.
…rehensive unit tests
Summary
Implement the foundational MDBX storage backend for CipherBFT per ADR-010. This PR adds the basic structure and skeleton
implementation using reth-db (MDBX) as the persistent storage layer.
Changes
New Files
Modified Files
What's Implemented
What's NOT Implemented (Future Work)
Testing
Build without mdbx (default)
cargo check -p cipherbft-storage
Build with mdbx feature
cargo check -p cipherbft-storage --features mdbx
Run tests
cargo test -p cipherbft-storage
All existing tests pass. The mdbx module compiles successfully with the feature flag enabled.