BOBS write routing, streaming POST, and per-pod worker concurrency#192
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three commits, all driven by the in-cluster BOBS user-download benchmark and the polytope-config tier-2 stress harness.
feat: add BOBS write routing and stress streaming(07e515d)test: make stress-worker output cheap to generate(ce07286)Stream.Bytesbuffer for full chunks so test-worker CPU does not dominate BOBS throughput tests.Stream worker output to BOBS as one request and add per-pod worker concurrency(24fff3a)Two real bottlenecks in the producer pipeline, both fixed:
Per-chunk POST loop replaced with one streaming POST.
workers/common/src/delivery/bobs.rswas issuing one POST per chunk to/write/{key}/{offset}; replaced with a single streaming POST through BOBS's accept-stream path. In-cluster benchmark: 230.8 MiB/s → 375.1 MiB/s (+63%), poll p50 15.8s → 10.1s.Per-pod worker concurrency.
WorkerConfig.worker_concurrency: usizeis now a required, validated (≥1) field;run_worker_loopspawns N tasks sharingArc<P>processor andArc<dyn ResultDelivery>and usestokio::sync::watchfor graceful shutdown. Heartbeat-per-job preserved. CLI flag--worker-concurrency+ env overridePOLYTOPE_WORKER_CONCURRENCY(default 1) on test-worker, fdb-worker, mars-worker, polytope-fe-worker.Tests:
worker_loop_processes_jobs_concurrently_when_worker_concurrency_is_twoworker_loop_validates_nonzero_worker_concurrencyConcurrency=8 was tried in-cluster and pushed BOBS past the contention knee on the dev vsphere PVC: aggregate dropped to 285 MiB/s, read p50 1.79s → 6.84s p95. Default stays 1; the knob is there for deployments with faster storage.
Verification
cargo fmt --all -- --checkclean.cargo test -p polytope-worker-common5/5.cargo test -p polytope-server-integration-tests27/27.cargo test --workspace -- --nocapturegreen.FIXED_TAG=majh-dev skaffold build -b eccr.ecmwf.int/polytope/test-workerthenpodman push eccr.ecmwf.int/polytope/test-worker:majh-dev.polytope-config test_bobs_in_cluster_full_range_throughputran end-to-end at 375 MiB/s aggregate with these changes; standalonebobs-benchmarkin-cluster reaches 378 MiB/s under the same fan-out, confirming the polytope pipeline is at parity with the BOBS-only ceiling on the dev cluster.