new pr#19
Conversation
|
@greptile |
Greptile SummaryThis PR introduces a comprehensive trading architecture with strategy lifecycle management, hard safety controls, and multi-tier command routing. The changes add three new core crates ( Major changes:
Critical issues:
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| .openclaw/extensions/trading-bridge/src/index.ts | Added new trading tools for engine control, strategy management, and risk status with improved event handling |
| crates/trading_daemon/src/main.rs | Major expansion adding strategy lifecycle, risk evaluation, and kill switch logic; potential state synchronization issues with candidate promotion |
| crates/trading_protocol/src/lib.rs | Added new command enums and payload types for Engine, Strategy, and Risk namespaces with comprehensive tests |
| crates/risk_core/src/lib.rs | New hard safety cage implementation with clear limits and thorough validation logic |
| crates/strategy_core/src/lib.rs | New strategy abstraction layer with regime-aware signal generation traits |
Sequence Diagram
sequenceDiagram
participant Agent as OpenClaw Agent
participant Bridge as Trading Bridge
participant Daemon as Trading Daemon
participant Risk as Risk Core
participant Strategy as Strategy Core
Agent->>Bridge: trading_strategy_manage(upload_candidate)
Bridge->>Daemon: Strategy.UploadCandidate
Daemon->>Daemon: Store candidate with gates
Daemon-->>Bridge: {ok: true, candidate}
Bridge-->>Agent: Candidate uploaded
Agent->>Bridge: trading_strategy_manage(promote_candidate)
Bridge->>Daemon: Strategy.PromoteCandidate
Daemon->>Risk: evaluate_promotion(request, snapshot)
Risk->>Risk: Check kill_switch, gates, notional limits
alt Promotion allowed
Risk-->>Daemon: Allow
Daemon->>Strategy: Activate new version
Daemon->>Daemon: Update canary_notional tracking
Daemon->>Daemon: Emit StrategyLifecycle event
Daemon-->>Bridge: {ok: true, strategy, previous_version}
Bridge-->>Agent: Promotion succeeded
else Promotion denied
Risk-->>Daemon: Deny{reason}
Daemon->>Daemon: Emit RiskAlert event
Daemon-->>Bridge: {ok: false, error: reason}
Bridge-->>Agent: Promotion failed
end
Agent->>Bridge: trading_engine_control(killswitch)
Bridge->>Daemon: Engine.KillSwitch
Daemon->>Daemon: Set kill_switch_engaged=true
Daemon->>Daemon: Stop all trading
Daemon->>Daemon: Emit RiskAlert + EngineHealth
Daemon-->>Bridge: {ok: true, state}
Bridge-->>Agent: Kill switch activated
Last reviewed commit: 50637c9
| ( | ||
| "kalshi.market_making", |
There was a problem hiding this comment.
The projected_strategy calculation doesn't account for replacing an existing candidate - it always adds to the current notional. If promoting a candidate to replace existing code, this could incorrectly reject the promotion.
| ( | |
| "kalshi.market_making", | |
| let projected_strategy = request.requested_canary_notional_cents; |
| "ok": false, | ||
| "error": format!("Unknown strategy '{}'", payload.strategy_id) |
There was a problem hiding this comment.
Race condition: strategy_canary_notional is updated after promotion succeeds, but if multiple promotions run concurrently, they could both pass the safety check and exceed limits.
| engine: engine.payload, | ||
| risk: risk.payload, |
There was a problem hiding this comment.
The Event.RiskAlert check was added to handle risk alerts alongside regular alerts
| risk_snapshot: RiskSnapshot::default(), | ||
| safety_policy: HardSafetyPolicy::default(), |
There was a problem hiding this comment.
Consider persisting strategy state to disk - currently all strategy configuration, versions, and candidates are lost on daemon restart
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Pull request overview
This PR significantly expands the trading bridge infrastructure by introducing a comprehensive command hierarchy, strategy management capabilities, and a hard safety cage for autonomous trading operations. The changes transform the basic control daemon into a production-ready trading engine with multi-layered risk controls.
Changes:
- Adds three new core crates (exchange_core, strategy_core, risk_core) providing abstraction layers for trading operations
- Introduces Engine, Strategy, and Risk command namespaces with corresponding handlers for engine control, strategy lifecycle management, and risk monitoring
- Implements strategy candidate upload/promotion workflow with gate-based validation and hard safety policy enforcement
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/trading_protocol/src/lib.rs | Adds EngineCommand, StrategyCommand, RiskCommand enums and RequestKind dispatcher; introduces payload types and event schemas for the expanded protocol |
| crates/trading_daemon/src/main.rs | Implements stateful engine with strategy management, candidate promotion workflow, risk snapshot tracking, and event history |
| crates/tradingctl/src/main.rs | Adds CLI commands for engine control, strategy management, and risk operations |
| crates/risk_core/src/lib.rs | Implements HardSafetyCage with non-bypassable limits for order placement and strategy promotion |
| crates/strategy_core/src/lib.rs | Defines strategy family taxonomy and plugin trait for regime-aware signal generation |
| crates/exchange_core/src/lib.rs | Provides normalized order types and exchange adapter trait for venue-agnostic execution |
| crates/trading_daemon/Dockerfile | Updates build to include new dependency crates |
| crates/trading_daemon/Cargo.toml | Removes kalshi_client dependency, adds risk_core, strategy_core dependencies |
| .openclaw/extensions/trading-bridge/src/index.ts | Adds trading_engine_status, trading_engine_control, trading_strategy_manage, and trading_risk_status tools |
| crates/README.md | Documents new crates and updated command namespaces |
| crates/Cargo.toml | Registers new workspace members |
| crates/Cargo.lock | Reflects dependency tree changes with removal of unused crates |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| state | ||
| .risk_snapshot | ||
| .strategy_canary_notional | ||
| .insert(strategy_id.clone(), payload.requested_canary_notional_cents); |
There was a problem hiding this comment.
The risk snapshot's strategy_canary_notional is being set to the new requested value, but it doesn't account for any existing canary notional from a previous deployment. If a strategy already has a canary deployment with X notional, and is promoted again with Y notional, the total should be checked against X+Y, not just Y. This could allow circumventing the hard safety limits by repeatedly promoting with small increments.
Consider using saturating_add to combine the existing value with the new value, or alternatively, replacing the old value if that's the intended behavior (in which case, the promotion evaluation logic in HardSafetyCage should also be adjusted to account for replacement rather than addition).
| "connected": true | ||
| "connected": true, | ||
| }), | ||
| ), |
There was a problem hiding this comment.
The Status command now returns Engine.Status but is still labeled as mapping to Control.Status in line 336. This creates inconsistency in the command hierarchy. Consider either keeping ControlCommand::Status as is and mapping it appropriately, or documenting clearly that the Status command has been migrated to EngineCommand. The current implementation mixes both approaches which could confuse users.
| ), | |
| ), | |
| // NOTE: ControlCommand::Status is kept for backward compatibility and | |
| // intentionally returns the engine status (Engine.Status) via `status_response`. | |
| // New code should prefer the engine-level status command where appropriate. |
| ControlCommand::Stop => { | ||
| let mut state = state.lock().await; | ||
| state.running = false; | ||
| state.paused = true; | ||
| state.last_command_at_ms = now_ms(); |
There was a problem hiding this comment.
The Stop command sets both running=false and paused=true, while the Pause command also sets both running=false and paused=true. This makes these two commands functionally identical in terms of state changes. Consider differentiating them (e.g., Stop could indicate a clean shutdown intent while Pause indicates temporary suspension) or document why they have identical effects.
| @@ -36,7 +112,7 @@ async fn main() -> Result<()> { | |||
| ensure_socket_parent_dir(&socket_path)?; | |||
| let _lock_file = acquire_single_instance_lock(&lock_path)?; | |||
| let listener = bind_listener(&socket_path)?; | |||
| let state = Arc::new(Mutex::new(EngineState::default())); | |||
| let state = Arc::new(Mutex::new(initial_engine_state())); | |||
|
|
|||
| let terminate = signal::ctrl_c(); | |||
| tokio::pin!(terminate); | |||
| @@ -73,6 +149,72 @@ async fn main() -> Result<()> { | |||
| Ok(()) | |||
| } | |||
|
|
|||
| fn initial_engine_state() -> EngineState { | |||
| let now = now_ms(); | |||
| EngineState { | |||
| running: false, | |||
| paused: false, | |||
| kill_switch_engaged: false, | |||
| risk_tripped: false, | |||
| started_at_ms: now, | |||
| last_command_at_ms: now, | |||
| strategies: default_strategies(), | |||
| recent_events: VecDeque::new(), | |||
| risk_snapshot: RiskSnapshot::default(), | |||
| safety_policy: HardSafetyPolicy::default(), | |||
| } | |||
| } | |||
|
|
|||
| fn default_strategies() -> HashMap<String, StrategyState> { | |||
| [ | |||
| ( | |||
| "kalshi.arbitrage", | |||
| StrategyFamily::Arbitrage, | |||
| "builtin-kalshi-arb", | |||
| ), | |||
| ( | |||
| "kalshi.market_making", | |||
| StrategyFamily::MarketMaking, | |||
| "builtin-kalshi-mm", | |||
| ), | |||
| ( | |||
| "core.mean_reversion", | |||
| StrategyFamily::MeanReversion, | |||
| "builtin-mean-reversion", | |||
| ), | |||
| ( | |||
| "core.momentum", | |||
| StrategyFamily::Momentum, | |||
| "builtin-momentum", | |||
| ), | |||
| ] | |||
| .into_iter() | |||
| .map(|(id, family, source)| { | |||
| ( | |||
| id.to_string(), | |||
| StrategyState { | |||
| id: id.to_string(), | |||
| enabled: true, | |||
| family, | |||
| source: source.to_string(), | |||
| version: 1, | |||
| canary_deployment: false, | |||
| canary_notional_cents: 0, | |||
| active_code_hash: None, | |||
| candidate: None, | |||
| }, | |||
| ) | |||
| }) | |||
| .collect() | |||
| } | |||
|
|
|||
| fn now_ms() -> i64 { | |||
| match SystemTime::now().duration_since(UNIX_EPOCH) { | |||
| Ok(dur) => dur.as_millis() as i64, | |||
| Err(_) => 0, | |||
| } | |||
| } | |||
|
|
|||
| fn socket_path_from_env() -> String { | |||
| std::env::var("TRADING_SOCKET_PATH").unwrap_or_else(|_| DEFAULT_SOCKET_PATH.to_string()) | |||
| } | |||
| @@ -159,61 +301,557 @@ async fn handle_connection(stream: UnixStream, state: Arc<Mutex<EngineState>>) { | |||
| } | |||
|
|
|||
There was a problem hiding this comment.
The trading_daemon crate has no test coverage for the complex state management and command processing logic. Given the critical nature of this trading engine (handling kill switches, risk limits, strategy promotions), comprehensive unit tests should be added to verify state transitions, error conditions, and edge cases. The protocol crate has tests, which shows testing is a codebase convention.
| case "status": | ||
| return "Control.Status"; | ||
| return "Engine.Status"; |
There was a problem hiding this comment.
The command "status" maps to "Engine.Status" in the TypeScript extension, but ControlCommand::Status in the Rust daemon also maps to status_response which returns Engine.Status data. This creates confusion about which namespace the status command belongs to. Additionally, the tradingctl CLI maps the Status command to Engine.Status but labels it as "Legacy status command". Consider consolidating to a single clear API surface.
| strategy.version = strategy.version.saturating_add(1); | ||
| strategy.enabled = true; | ||
| strategy.canary_deployment = true; | ||
| strategy.canary_notional_cents = payload.requested_canary_notional_cents; | ||
| strategy.active_code_hash = Some(payload.code_hash.clone()); | ||
| strategy.source = candidate.source; | ||
| strategy.candidate = None; |
There was a problem hiding this comment.
The StrategyState fields enabled, canary_deployment, canary_notional_cents, and active_code_hash are all being set during promotion, but there's no validation that these changes are consistent with the rest of the system state. For instance, when setting enabled=true automatically, there's no check whether the strategy should actually be enabled based on system state (e.g., if the engine is paused or kill switch is engaged). Consider adding validation or documenting why automatic enabling during promotion is always safe.
| id.to_string(), | ||
| StrategyState { | ||
| id: id.to_string(), | ||
| enabled: true, |
There was a problem hiding this comment.
The default strategies are hardcoded with enabled=true at initialization. This means all strategies are enabled when the daemon starts, even if the engine itself is not running. Consider whether this is the intended behavior, or if strategies should only be enabled after an explicit Start command or after the operator has reviewed the configuration.
| enabled: true, | |
| enabled: false, |
| let mut state = state.lock().await; | ||
| state.paused = true; | ||
| state.running = false; | ||
| state.risk_snapshot.paused = true; | ||
| state.last_command_at_ms = now_ms(); | ||
| let event = engine_health_event(&state); | ||
| push_event(&mut state, event); | ||
| status_response(request, &state) | ||
| } | ||
| EngineCommand::Resume => { | ||
| let mut state = state.lock().await; | ||
| if state.kill_switch_engaged { | ||
| return Envelope::response_to( | ||
| request, | ||
| json!({ | ||
| "ok": false, | ||
| "error": "Kill switch engaged; cannot resume" | ||
| }), | ||
| ); | ||
| } | ||
| state.paused = false; | ||
| state.running = true; | ||
| state.risk_snapshot.paused = false; | ||
| state.last_command_at_ms = now_ms(); | ||
| let event = engine_health_event(&state); | ||
| push_event(&mut state, event); | ||
| status_response(request, &state) | ||
| } | ||
| EngineCommand::KillSwitch => { | ||
| let mut state = state.lock().await; | ||
| state.kill_switch_engaged = true; | ||
| state.risk_tripped = true; | ||
| state.running = false; | ||
| state.paused = true; | ||
| state.risk_snapshot.kill_switch_engaged = true; | ||
| state.risk_snapshot.paused = true; |
There was a problem hiding this comment.
The pause field in risk_snapshot is being set directly in the Pause and Resume commands, but it's not being updated in the Start and Stop commands. This could lead to inconsistent state between state.paused and state.risk_snapshot.paused. Either remove the duplicate field or ensure both are always kept in sync.
| if state.kill_switch_engaged { | ||
| return Envelope::response_to( | ||
| request, | ||
| json!({ | ||
| "ok": false, | ||
| "error": "Kill switch engaged; cannot start until operator reset" | ||
| }), | ||
| ); |
There was a problem hiding this comment.
The kill_switch_engaged check prevents starting the engine, but there's no mechanism provided to reset the kill switch. Once engaged, the kill switch permanently prevents the engine from starting until the daemon is restarted. Consider adding a RiskCommand to reset the kill switch after operator verification, or document the intended recovery procedure.
| serde_json::from_str(raw) | ||
| .unwrap_or_else(|_| serde_json::Value::String(raw.to_string())) |
There was a problem hiding this comment.
The RiskOverride command silently falls back to treating the value as a string if JSON parsing fails. This could mask configuration errors where the operator intended to pass a JSON object but made a syntax error. Consider logging a warning or returning an error when JSON parsing fails, or at minimum document this fallback behavior clearly.
| serde_json::from_str(raw) | |
| .unwrap_or_else(|_| serde_json::Value::String(raw.to_string())) | |
| match serde_json::from_str(raw) { | |
| Ok(v) => v, | |
| Err(e) => { | |
| eprintln!( | |
| "Warning: failed to parse risk override value as JSON ({}). \ | |
| Treating it as a plain string instead: {}", | |
| e, raw | |
| ); | |
| serde_json::Value::String(raw.to_string()) | |
| } | |
| } |
|
@greptile |
Greptile SummaryThis PR adds a comprehensive Rust-based trading infrastructure with OpenClaw integration. The implementation introduces a Unix domain socket daemon that manages trading strategies with hard safety controls, canary deployments, and autonomous code promotion workflows. Major Changes:
Critical Issue:
Architecture Notes:
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| .openclaw/extensions/trading-bridge/src/index.ts | Expanded trading bridge with new engine controls, strategy management, and risk monitoring tools |
| crates/trading_daemon/src/main.rs | Core daemon with strategy promotion logic and risk evaluation; has race condition in concurrent promotions |
| crates/risk_core/src/lib.rs | Hard safety cage implementation correctly treats promotions as replacements, not additions |
| crates/trading_protocol/src/lib.rs | Protocol definitions for UDS communication with envelope structure and command types |
Sequence Diagram
sequenceDiagram
participant Agent as AI Agent/Client
participant Bridge as Trading Bridge (TS)
participant Daemon as Trading Daemon (Rust)
participant Risk as Risk Core
participant Strategy as Strategy State
Agent->>Bridge: trading_strategy_manage(upload_candidate)
Bridge->>Daemon: Strategy.UploadCandidate
Daemon->>Strategy: Store candidate with gates
Daemon-->>Bridge: {ok: true, candidate}
Bridge-->>Agent: Upload confirmation
Agent->>Bridge: trading_strategy_manage(promote_candidate)
Bridge->>Daemon: Strategy.PromoteCandidate
Daemon->>Risk: evaluate_promotion(request, snapshot)
Risk->>Risk: Check gates, limits, kill switch
Risk-->>Daemon: Allow/Deny decision
alt Promotion Allowed
Daemon->>Strategy: Update version, enable canary
Daemon->>Daemon: Update risk_snapshot.strategy_canary_notional
Daemon-->>Bridge: {ok: true, strategy, version}
Bridge-->>Agent: Promotion success
else Promotion Denied
Daemon-->>Bridge: {ok: false, error: reason}
Bridge-->>Agent: Promotion failed
end
Agent->>Bridge: trading_engine_control(pause)
Bridge->>Daemon: Engine.Pause
Daemon->>Daemon: Set paused=true, running=false
Daemon-->>Bridge: Status response
Bridge-->>Agent: Engine paused
Daemon->>Bridge: Event.RiskAlert
Bridge->>Agent: Broadcast alert message
Last reviewed commit: 4077d8e
| state | ||
| .risk_snapshot | ||
| .strategy_canary_notional | ||
| .insert(strategy_id.clone(), payload.requested_canary_notional_cents); |
There was a problem hiding this comment.
Race condition still exists - if two promotions run concurrently, both read the old strategy_canary_notional, pass the safety check, then both update it sequentially, exceeding the limit. The update happens AFTER the promotion succeeds at line 655-658, but the safety check at line 627 reads the stale value.
There was a problem hiding this comment.
This is a false positive, the code is protected by a Mutex correct?
There was a problem hiding this comment.
You're absolutely right - this is a false positive. The entire request processing path holds the mutex lock:
process_strategy_requestacquires the lock at line 449:let mut state = state.lock().await;- The safety check reads
state.risk_snapshotwhile holding the lock - The promotion updates
state.risk_snapshot.strategy_canary_notionalwhile still holding the same lock - The lock is only released when
stategoes out of scope at the end of the function
Since the mutex is held for the entire duration from reading to writing, no other request can interleave. Concurrent promotion requests will be serialized by the mutex.
My apologies for the incorrect assessment.
| risk_tripped: false, | ||
| started_at_ms: now, | ||
| last_command_at_ms: now, | ||
| strategies: default_strategies(), |
There was a problem hiding this comment.
All strategy state (versions, candidates, active code hashes) is lost on daemon restart. If the daemon crashes during a canary deployment, there's no way to recover the deployment state.
| source: typeof input.source === "string" && input.source.trim() !== "" ? input.source : "agent", | ||
| code_hash: asString(input.code_hash, "code_hash"), | ||
| requested_canary_notional_cents: | ||
| typeof input.requested_canary_notional_cents === "number" | ||
| ? input.requested_canary_notional_cents | ||
| : 250, |
There was a problem hiding this comment.
Default value of 250 cents ($2.50) for canary notional seems very low for any meaningful trading strategy testing. Consider if this should be higher or documented more clearly.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| let candidate = match strategy.candidate.clone() { | ||
| Some(candidate) => candidate, | ||
| None => { | ||
| return Envelope::response_to( | ||
| request, | ||
| json!({ | ||
| "ok": false, | ||
| "error": format!("No uploaded candidate for '{}'", payload.strategy_id) | ||
| }), | ||
| ); | ||
| } | ||
| }; |
There was a problem hiding this comment.
The candidate is cloned from the strategy, but if there's no candidate, an error is returned. However, there's no validation that the candidate hasn't expired or been invalidated since upload.
| if candidate.code_hash != payload.code_hash { | ||
| return Envelope::response_to( | ||
| request, | ||
| json!({ | ||
| "ok": false, | ||
| "error": "candidate code hash mismatch" | ||
| }), | ||
| ); | ||
| } |
There was a problem hiding this comment.
Code hash mismatch check is good for preventing stale promotions, but there's no timestamp or TTL check. An uploaded candidate from days/weeks ago could still be promoted if the hash matches.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
@greptile |
Greptile SummaryThis PR introduces a comprehensive risk management and strategy abstraction layer for the trading daemon. The changes add three new core crates ( Major Changes:
Key Safety Features:
Architecture: Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| crates/trading_daemon/src/main.rs | added strategy state persistence, candidate TTL validation, and comprehensive safety checks via HardSafetyCage integration |
| crates/risk_core/src/lib.rs | new hard safety cage implementation with non-bypassable risk policies for autonomous trading strategies |
| crates/strategy_core/src/lib.rs | new strategy abstraction layer defining StrategyPlugin trait and regime-aware signal generation |
| crates/exchange_core/src/lib.rs | new exchange abstraction defining ExchangeAdapter trait for venue-agnostic order execution |
| .openclaw/extensions/trading-bridge/src/index.ts | added new strategy management tools (upload_candidate, promote_candidate) and risk status endpoint to trading bridge |
| crates/trading_protocol/src/lib.rs | expanded protocol with strategy lifecycle events, risk alerts, and candidate promotion payloads |
Class Diagram
classDiagram
class TradingDaemon {
+EngineState state
+handle_connection()
+process_request()
+promote_candidate_locked()
+persist_strategy_state()
}
class EngineState {
+HashMap~String, StrategyState~ strategies
+RiskSnapshot risk_snapshot
+HardSafetyPolicy safety_policy
+bool running
+bool paused
}
class StrategyState {
+String id
+bool enabled
+StrategyFamily family
+u64 version
+bool canary_deployment
+i64 canary_notional_cents
+Option~String~ active_code_hash
+Option~StrategyCandidate~ candidate
}
class StrategyCandidate {
+String source
+String code_hash
+i64 requested_canary_notional_cents
+bool compile_passed
+bool replay_passed
+bool paper_passed
+bool latency_passed
+bool risk_passed
}
class HardSafetyCage {
+HardSafetyPolicy policy
+evaluate_promotion()
+evaluate_order()
}
class RiskSnapshot {
+i64 total_notional_cents
+i64 drawdown_cents
+u32 orders_last_minute
+bool kill_switch_engaged
+HashMap~String, i64~ strategy_canary_notional
}
class StrategyPlugin {
<<interface>>
+id() String
+family() StrategyFamily
+evaluate() Result~SignalIntent~
}
class ExchangeAdapter {
<<interface>>
+venue() String
+place_order() Result~OrderAck~
+cancel_order() Result
+sync_positions() Result~Vec~PositionSnapshot~~
}
class TradingBridge {
+TradingClient client
+registerTool trading_engine_status
+registerTool trading_strategy_manage
+registerTool trading_risk_status
}
TradingDaemon --> EngineState
EngineState --> StrategyState
EngineState --> RiskSnapshot
EngineState --> HardSafetyPolicy
StrategyState --> StrategyCandidate
TradingDaemon --> HardSafetyCage
HardSafetyCage --> HardSafetyPolicy
HardSafetyCage --> RiskSnapshot
StrategyPlugin --> StrategyFamily
ExchangeAdapter --> NormalizedOrderRequest
TradingBridge --> TradingDaemon : UDS socket
Last reviewed commit: 6b2ef3d
No description provided.