Conversation
aab69bb to
6b456bb
Compare
cmd/cli/eth_oracle_e2e.go
Outdated
| @@ -0,0 +1,340 @@ | |||
| package main | |||
There was a problem hiding this comment.
Initial Reveiw
Good
- Code looks really high quality
- Dynamically extracts the ERC20 info by using eth_call
- Testing is robust and well done
- Good abstraction
Concerns
- Inspecting every ERC20 before checking if it includes the lock order / close order bytes is expensive and dangerous. Could lead to the oracle falling behind. Why not skip any ERC20 transfer that doesn't have 'extra data'?
- "remove lock order from store if it was not found in the order book" is wrong - it should remove lock from the store if the order was 'locked'
- Safe block logic and potential re-orgs (non-actionable - just noting)
- Use of JSON file is
non-atomicand very susceptible to corruption. Should use a badgerDB instance instead. - Does
err := <-sub.Err():mean that the connection dropped? If notp.connectWithRetry()could potentially cause multiple simultaneous subscribers. - I don't think a separate package at 'root' level for testing is necessary. Please move to the oracle package if possible
- Potentially too much abstraction at the ETH transaction level. Would like to see code footprint reduced a bit if possible
Missing logic:
- Marking orders included in a 'proposal' as 'pending' to not duplicate orders
- Smart 'resubmit' logic
- Smart 'waiting period' for preventing brand new locks and closes from being submitted to allow slow nodes to process orders on Ethereum
- Removal of native 'lock' and 'close' order logic
There was a problem hiding this comment.
Files are written atomically by writing to a temporary file then moving that file into place. This ensures only a successfully written file is moved into place.
This should be sufficient for safety guarantees and also lends itself to easier administration and debugging.
There was a problem hiding this comment.
Yes, but speed is a concern - as we can't have the oracle falling behind. Do you forsee this being a bottleneck?
There was a problem hiding this comment.
Yes, but speed is a concern - as we can't have the oracle falling behind. Do you forsee this being a bottleneck?
In general I do not. On my system a benchmark wrote, read and removed 80k orders per second with sync.
How many orders should we be prepared to handle on a node that is very far behind?
One advantage to using BadgerDB though is adding commands to the canopy binary rather than using standard shell tools to query & troubleshoot.
26efc88 to
1a1ccaf
Compare
a54880f to
8f492a1
Compare
75511d9 to
462a82d
Compare
Refactor EthBlockProvider architecture and add comprehensive test coverage - Refactor EthBlockProvider to use centralized configuration struct - Add calculateFetchRange function with proper height management and reorg protection - Simplify processBlocks to use internal safeHeight state - Add comprehensive table-driven tests for calculateFetchRange function - Update TestEthBlockProvider_processBlocks to match simplified function signature - Add transaction data size validation with 1024 byte limit - Improve error handling and retry logic for transaction processing - Add extensive documentation and flow diagrams - Enhance oracle state management and validation - Add new CLI query commands and RPC endpoints
- Add debug logs for RPC and WebSocket connection closing - Add info log before connection retry with delay duration - Add error log when websocket client is not initialized - Add error log when subscription error is received - Add info log when oracle is not enabled
Add 18 new Prometheus metrics for oracle monitoring across 5 categories: Block height metrics: - canopy_oracle_last_processed_height - canopy_oracle_confirmation_lag - canopy_oracle_orders_awaiting_confirmation - canopy_oracle_reorg_rollback_depth (histogram) Order lifecycle metrics: - canopy_oracle_orders_not_in_orderbook_total - canopy_oracle_orders_duplicate_total - canopy_oracle_orders_archived_total - canopy_oracle_lock_orders_committed_total - canopy_oracle_close_orders_committed_total Validation failure metrics: - canopy_oracle_validation_failures_total (labeled by reason) Submission tracking metrics: - canopy_oracle_orders_held_awaiting_safe_total - canopy_oracle_orders_held_propose_delay_total - canopy_oracle_orders_held_resubmit_delay_total - canopy_oracle_lock_order_resubmissions_total - canopy_oracle_close_order_resubmissions_total Store operation metrics: - canopy_oracle_store_write_errors_total - canopy_oracle_store_read_errors_total - canopy_oracle_store_remove_errors_total Also fixes test to use nil metrics (methods are nil-safe).
Add 23 new Prometheus metrics for better observability of Ethereum block provider operations: High Priority - Connection & Sync: - RPC/WS connection attempts and errors - Connection state tracking (disconnected/connecting/rpc/fully_connected) - Sync status (unsynced/syncing/synced) - Block height lag High Priority - Block Processing: - Block fetch errors by type - Processing timeouts and batch sizes - Reorg detection Medium Priority - Transaction Processing: - Transactions total, parse errors, retry by attempt - Exhausted retries, success status breakdown - Receipt fetch errors Medium Priority - Order Detection & Token Cache: - ERC20 transfer, lock/close order detection - Order validation errors by type - Token info fetch errors and contract call timeouts
Add three new gauge metrics for tracking Ethereum block heights: - canopy_eth_chain_head_height: latest block from chain head - canopy_eth_last_processed_height: last block successfully processed - canopy_eth_safe_height: blocks sent through channel (confirmed) These complement the existing block_height_lag metric for better visibility into sync progress and block processing state.
Document all Prometheus metrics for the oracle system including: - Oracle metrics: block heights, order lifecycle, validation, submissions - Eth metrics: connection, sync, block processing, transactions, orders Includes example Prometheus queries, alerting recommendations, and Grafana dashboard tips for effective monitoring.
Apply gofmt formatting fixes: - Reorder imports alphabetically - Align struct field assignments - Normalize comment list indentation - Add missing trailing newlines - Remove extra blank lines
Change lock order safe height check from Debug to Info for better visibility when orders are being held back due to confirmation requirements.
- Update shouldSubmit() to log specific hold reasons at Info level - Log propose delay with blocks needed until eligible - Log resubmit delay with exact eligibility root height - Log lock order cooldown with blocks remaining - Log already submitted cases for both lock and close orders - Remove redundant logs from WitnessedOrders that assumed wrong reason
Rename the CanopyOrders RPC endpoint and related code to OracleOrders for better clarity about the endpoint's purpose. - Route path: /v1/query/canopy-orders -> /v1/query/oracle-orders - CLI command: canopy-orders -> oracle-orders - Handler, client method, and response type renamed accordingly
This PR implements an Ethereum transaction Oracle allowing for cross-chain transfers to Canopy.