lifecycle: Extract auto-pause coordinator into cube-lifecycle-manager#705
lifecycle: Extract auto-pause coordinator into cube-lifecycle-manager#705chenhengqi wants to merge 1 commit into
Conversation
PR #705 Review: Extract auto-pause coordinator into cube-lifecycle-managerOverall: Well-structured refactor. The move from in-container sidecar to standalone service is cleanly executed, the Fleet interface abstraction is sound, and the Redis-based service discovery design is architecturally correct. The one-click deployment wiring is thorough (systemd, compose, prepare/start/stop scripts). Strengths
Key Concerns
Minor
|
| -- also after any error observed on the previous tick — that way a | ||
| -- transient Redis restart never leaves us in a state where the heartbeat | ||
| -- says "alive" but the registry row is gone. | ||
| if not state.registry_pushed then |
There was a problem hiding this comment.
The registry_pushed guard creates a permanent desync scenario. If CLM's pruneExpired ever deletes this proxy's registry entry (e.g., heartbeat TTL expired and proxy recovered), registry_pushed stays true and the HSET is skipped forever. CLM sees a heartbeat ZSET member but HMGET returns nil, so the proxy is invisible until a Redis failure forces registry_pushed back to false.
HSET is idempotent — drop the guard and publish on every tick, or verify the entry exists before skipping.
|
|
||
| -- publish is the single-tick worker. It (re)writes the registry entry the | ||
| -- first time it runs (or after a Redis outage), and always refreshes the | ||
| -- heartbeat score to now_ms. Every network operation is wrapped in pcall |
There was a problem hiding this comment.
Comment claims every network operation is "wrapped in pcall" but the actual red:hset() and red:zadd() calls at lines 63 and 71 are bare calls with only return-value checking. If the redis_iresty library ever throws (not returns nil+err), the error propagates into ngx.timer.every which permanently stops calling publish() — the proxy silently stops heartbeating until nginx restart.
| discSvc *discovery.RedisDiscovery | ||
| staticFleet *discovery.Static | ||
| ) | ||
| if len(cfg.CubeProxyAdminURLs) > 0 && cfg.UseStaticFleet { |
There was a problem hiding this comment.
Setting CUBE_LCM_USE_STATIC_FLEET=1 without CUBE_LCM_PROXY_ADMIN_URLS is silently ignored: the condition requires both len(urls) > 0 && UseStaticFleet. An operator enabling static mode but forgetting URLs gets Redis discovery instead, which may reach the wrong endpoints. Add a validation check in config.Validate().
| // pruneExpired removes heartbeat members whose score is <= cutoff and drops | ||
| // the matching registry entries. Callers pass the same cutoff they used for | ||
| // the read so the window is consistent. | ||
| func (d *RedisDiscovery) pruneExpired(ctx context.Context, cutoff int64) error { |
There was a problem hiding this comment.
Race window: if a proxy's heartbeat score is ≤cutoff at read time (ZRangeByScore line 207) but gets refreshed (ZADD) between this read and the pipeline execution (line 214-218), the ZREMRANGEBYSCORE won't remove the new heartbeat (score > cutoff) but the HDEL still targets this proxy_id from the stale expired list. Result: a live proxy's registry entry is deleted while its heartbeat is intact, making it invisible to CLM until the next Redis failure.
Consider a Lua script (EVAL) to atomically read, ZREMRANGEBYSCORE, and HDEL in one go.
| @@ -129,8 +143,12 @@ func (s *Server) handleHealthz(w http.ResponseWriter, _ *http.Request) { | |||
|
|
|||
There was a problem hiding this comment.
handleReadyz exposes registry_len and fleet_size. The CLM listener defaults to 0.0.0.0:8083. While these are not secrets, they leak cluster topology to any network peer. Consider either restricting /readyz to a separate loopback listener or documenting that operators should firewall port 8083.
| if ldict then | ||
| hb_ms = ldict:get("cube_proxy_heartbeat_last_pushed_ms") | ||
| end | ||
| return reply(ngx.HTTP_OK, { |
There was a problem hiding this comment.
The healthz endpoint exposes free_space() for all three shared dicts (META, STATE, LAST). With the admin listener now potentially on the node IP, this leaks capacity metrics that could be used to profile the system. Consider moving diagnostic data to a separate /admin/stats endpoint and keeping /admin/healthz minimal ({"ok":true}).
| end | ||
|
|
||
| local redis = require "redis_iresty" | ||
| local red = redis:new({ |
There was a problem hiding this comment.
Redis cosocket not pooled — A new Redis connection is created on every heartbeat tick but set_keepalive() is never called, so the cosocket is closed at GC time rather than returned to the pool. At scale (100+ proxy replicas each heartbeating every 5s), this creates ~20 TCP handshakes/second to the Redis server unnecessarily.
Suggestion: Add red:set_keepalive(10000, 100) after the ZADD call (around line 76) to reuse the connection across ticks.
| # node) can push meta / state updates here. Token auth via | ||
| # $cube_admin_token below prevents outside abuse; operators may further | ||
| # tighten this with iptables / security groups. | ||
| CUBE_PROXY_ADMIN_LISTEN="${CUBE_PROXY_ADMIN_LISTEN:-${CUBE_SANDBOX_NODE_IP}}" |
There was a problem hiding this comment.
Admin server now network-reachable — Changing from 127.0.0.1 (hardcoded in the old nginx.conf) to CUBE_SANDBOX_NODE_IP (line 42 default) means the admin server (port 8082) is now reachable across the network. The token auth ($cube_admin_token) defaults to empty. Consider adding a deploy-time check that exits with a clear error when CUBE_ADMIN_TOKEN is empty and the admin listen is not loopback.
| // Replay the current registry snapshot to the newly-arrived | ||
| // proxy. We must not block the discovery refresh loop, so | ||
| // this runs in its own goroutine with a bounded context. | ||
| go replayRegistryTo(rootCtx, pushClient, reg, ep, logger.Named("replay")) |
There was a problem hiding this comment.
Thundering-herd on startup — This goroutine is spawned for each new proxy joining. When N proxies register in the same discovery cycle (guaranteed on cluster restart), this fires N goroutines. Each iterates over every sandbox in the registry and pushes entries one-at-a-time via individual HTTP POST calls. At 100 proxies x 10,000 sandboxes = 1,000,000 sequential HTTP calls, delaying metadata delivery for minutes.
Suggestion: Add a batch admin endpoint to CubeProxy (e.g., POST /admin/meta/batch accepting an array of entries) so replayRegistryTo sends all entries in O(1) HTTP calls per proxy. At minimum, limit concurrent replays with a semaphore.
| CUBE_LCM_CUBEMASTER_URL="${CUBE_LCM_CUBEMASTER_URL:-http://${CUBE_SANDBOX_NODE_IP}:8089}" | ||
|
|
||
| # CLM listens on host network so CubeProxy on other nodes can reach it. | ||
| CUBE_LCM_LISTEN_ADDR="${CUBE_LCM_LISTEN_ADDR:-0.0.0.0:8083}" |
There was a problem hiding this comment.
CLM HTTP server on 0.0.0.0:8083 — The /internal/resume endpoint has no authentication, and binding to 0.0.0.0 makes it reachable from any network interface. Previously this was bound to 127.0.0.1 (sidecar model). Consider authenticating this endpoint with the same CUBE_ADMIN_TOKEN used for admin pushes, or at minimum changing the default to 127.0.0.1 when running in single-host mode.
| // pushClient reads Fleet.Snapshot() on every call, so a later swap-in of | ||
| // the RedisDiscovery Fleet is picked up automatically. We construct the | ||
| // discovery instance below so its onJoin can reference pushClient. | ||
| var pushClient *proxypush.Client |
There was a problem hiding this comment.
Fragile closure capture of nil pushClient — pushClient is declared nil on line 101 and captured by the OnJoin closure on line 113. It only works because discSvc.Run() (started later, line 182) fires after pushClient is assigned on line 123. A future refactoring that starts the discovery loop earlier (e.g., an initial sync inside New()) would produce a nil-pointer panic at runtime with no compile-time warning.
Suggestion: Add a SetOnJoin(func(Endpoint)) method to RedisDiscovery and wire the callback after pushClient is fully initialized on line 123.
|
|
||
| // Redis key names. Kept as package-level vars so tests can point them at a | ||
| // scratch namespace if needed. | ||
| var ( |
There was a problem hiding this comment.
Mutable package-level Redis keys — These exported vars can be mutated by any imported package. The comment says they're mutable so tests can use scratch namespaces, but this creates global state risk: concurrent test cases can interfere with each other, and a stray import could change production key names.
Suggestion: Accept key names as constructor parameters (discovery.New(o Options)), or make them unexported and provide a test-only setter.
| -e 's|^\(\s*listen \)8080\( ssl reuseport;\)|\1__CUBE_PROXY_HTTPS_PORT__\2|' \ | ||
| -e 's|^\(\s*set \$host_proxy_port \)8081;|\1__CUBE_PROXY_HTTP_PORT__;|' \ | ||
| -e 's|^\(\s*set \$host_proxy_port \)8080;|\1__CUBE_PROXY_HTTPS_PORT__;|' \ | ||
| -e 's|^\(\s*listen \)127\.0\.0\.1:8082;|\1__CUBE_PROXY_ADMIN_LISTEN__:8082;|' \ |
There was a problem hiding this comment.
If CUBE_PROXY_ADMIN_LISTEN is not 127.0.0.1, should we require CUBE_ADMIN_TOKEN forcibly?
| -- auto-pause / auto-resume coordination dicts. See nginx.conf admin server | ||
| -- block for routing. | ||
| -- | ||
| -- All requests are loopback-bound (the listen directive is 127.0.0.1:8082); |
There was a problem hiding this comment.
This comment seems to be inaccurate.
2614d4b to
6e0db02
Compare
Split cube-proxy-sidecar out of the CubeProxy image into a standalone control-plane service (CLM), add Redis-backed service discovery so CubeProxy can run in multiple replicas, and wire it into one-click under cube-sandbox-control.target. Wire protocol, Redis schema, and SDK behavior are unchanged. Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Split cube-proxy-sidecar out of the CubeProxy image into a standalone control-plane service (CLM), add Redis-backed service discovery so CubeProxy can run in multiple replicas, and wire it into one-click under cube-sandbox-control.target. Wire protocol, Redis schema, and SDK behavior are unchanged.