WIP: An attempt delay fetching build's size for MMap until 1st read#1887
WIP: An attempt delay fetching build's size for MMap until 1st read#1887
Conversation
Replace the bulk Chunker with a StreamingChunker that reads data progressively from upstream (GCS/NFS), notifying blocked callers as soon as their specific byte range arrives instead of waiting for the entire 4MB chunk. Benchmarks show ~2.3x lower average caller latency under random access patterns (~4.5ms vs ~10.2ms simulated GCS). - Add StreamingReader interface (OpenRangeReader) to storage package, implemented on all backends (GCS, FS, AWS, NFS cache) - Add read-through caching on cachedSeekable.OpenRangeReader - Embed StreamingReader in Seekable interface - Switch StorageDiff from Chunker to StreamingChunker Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Revert storage_diff.go to use the existing Chunker so the StreamingChunker is not yet on any production code path. StreamingChunker improvements: - Sort waiters by endByte for O(satisfied) notification instead of O(all) - Add 60s fetch timeout and thread context through runFetch/progressiveRead - Add panic recovery in runFetch so waiters never hang on panics - Add cancelOnCloseReader to GCS OpenRangeReader for proper context cleanup - Add nolint:containedctx to cacheWriteThroughReader.ctx - Add TestStreamingChunker_PanicRecovery test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous benchmark used a zero-latency upstream and reported analytically computed latency predictions. Replace with a bandwidth-limited upstream (real time.Sleep) tuned to observed production latencies: GCS: 20ms TTFB + 100 MB/s → ~62ms per 4MB chunk (observed ~60ms) NFS: 1ms TTFB + 500 MB/s → ~9ms per 4MB chunk (observed ~9-10ms) Results confirm the analytical predictions and additionally surface a ~5ms constant overhead in the old Chunker's post-transfer path (batch setIsCached + WaitMap signaling) that the analytical model could not capture. Also reduces concurrent callers from 20 to 3 to match realistic UFFD/NBD concurrency (1-2 vCPU faults + 1 NBD request), and pre-generates a shared offset sequence across all sub-benchmarks for direct comparability. Results (300 iterations × 3 counts, 3 concurrent callers): GCS/StreamingChunker 42790 avg-us/caller 63260 worst-us/caller GCS/StreamingChunker 42776 avg-us/caller 63221 worst-us/caller GCS/StreamingChunker 42811 avg-us/caller 63400 worst-us/caller GCS/Chunker 66204 avg-us/caller 68199 worst-us/caller GCS/Chunker 66351 avg-us/caller 68814 worst-us/caller GCS/Chunker 66261 avg-us/caller 67971 worst-us/caller NFS/StreamingChunker 6038 avg-us/caller 10387 worst-us/caller NFS/StreamingChunker 6020 avg-us/caller 10392 worst-us/caller NFS/StreamingChunker 6041 avg-us/caller 10434 worst-us/caller NFS/Chunker 13812 avg-us/caller 15299 worst-us/caller NFS/Chunker 13839 avg-us/caller 15880 worst-us/caller NFS/Chunker 13773 avg-us/caller 15514 worst-us/caller Backend Chunker Avg Worst Avg speedup GCS StreamingChunker 42.8ms 63.3ms 1.55x GCS Chunker (baseline) 66.3ms 68.3ms — NFS StreamingChunker 6.0ms 10.4ms 2.3x NFS Chunker (baseline) 13.8ms 15.5ms — The ~5ms worst-case gap (constant across both backends) is fixed overhead in the old Chunker: burst of 1024 sync.Map.Store calls after ReadAt + WaitMap channel signaling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire StreamingChunker into the single production call site (StorageDiff.Init). All UFFD page faults, NBD reads, and prefetch operations now use progressive chunk fetching. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove ReadAt fallback in cachedSeekable.OpenRangeReader: Seekable embeds StreamingReader, so the type assertion was dead code - Reuse buffer in StreamingChunker.WriteTo instead of allocating 4MB per iteration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…arison Logs every 100th Slice() call (hit/miss) and all failures. Temporary — will be removed after cluster benchmarking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Benchmarking complete — StreamingChunker shows ~20% p50 improvement on sandbox Create times vs baseline Chunker. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract Chunker interface from the concrete FullFetchChunker (renamed from Chunker) so StorageDiff can switch implementations at runtime. When the use-streaming-chunker flag is enabled, StreamingChunker is used; otherwise the original FullFetchChunker is used (safe default). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
packages/orchestrator/internal/sandbox/block/streaming_chunk.go
Outdated
Show resolved
Hide resolved
packages/orchestrator/internal/sandbox/block/streaming_chunk.go
Outdated
Show resolved
Hide resolved
packages/orchestrator/internal/sandbox/block/streaming_chunk.go
Outdated
Show resolved
Hide resolved
| mu sync.Mutex | ||
| chunkOff int64 | ||
| chunkLen int64 | ||
| blockSize int64 |
There was a problem hiding this comment.
this is suspicious but not changed in this PR. The block size of a chunker for buildId A seems determined by the block size of the first top-level build file (B) that referred to it in its mappings. This seems ok since the block size is used only for tracking isCached, etc.
…-faster # Conflicts: # packages/shared/pkg/feature-flags/flags.go
…m builds When StorageDiff.Init() creates a StreamingChunker for upstream builds, it no longer calls obj.Size() (GCS Attrs / S3 HeadObject). Instead, the object size is discovered for free from the first range-read response (Content-Range header / GCS reader.Attrs.Size) and the mmap-backed cache is created lazily on first fetch. Storage layer changes: - GCS: cache total size from reader.Attrs.Size in ReadAt/OpenRangeReader - AWS: parse Content-Range header in ReadAt/OpenRangeReader - Both: Size() returns cached value first, falls back to metadata call StreamingChunker changes: - Constructor takes sizeFunc instead of pre-resolved size - Cache created on first runFetch via sync.Once - atomic.Pointer[Cache] and atomic.Int64 for size make every concurrent read self-documenting via .Load() — no separate init gate needed - fetchSession uses back-reference to chunker instead of duplicating cache/blockSize fields - isCached fast-path in registerAndWait moved before s.mu.Lock (atomic load + sync.Map are already thread-safe) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ae3827b to
83ab9f5
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 83ab9f567b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| size := c.size.Load() | ||
|
|
||
| for i := int64(0); i < c.size; i += storage.MemoryChunkSize { | ||
| for i := int64(0); i < size; i += storage.MemoryChunkSize { |
There was a problem hiding this comment.
Initialize chunker before streaming full contents
With lazy initialization, c.size remains 0 until a range fetch runs, but WriteTo now reads size := c.size.Load() and immediately returns (0, nil) when called first. In streaming mode StorageDiff.Init no longer fetches size up front, so a first operation that exports via WriteTo can silently emit an empty output instead of the object bytes.
Useful? React with 👍 / 👎.
| } | ||
|
|
||
| b, cacheErr := c.cache.Slice(off, length) | ||
| b, cacheErr := c.cache.Load().Slice(off, length) |
There was a problem hiding this comment.
Guard cache dereference when no fetch was needed
Because cache creation is deferred, c.cache.Load() can still be nil on the first call; if Slice is invoked with length == 0 (including ReadAt with an empty buffer), all fetch goroutines short-circuit with clippedLen <= 0, eg.Wait() succeeds, and this line dereferences a nil cache pointer, causing a panic. Previously the cache always existed, so zero-length reads were safe.
Useful? React with 👍 / 👎.
|
lets add first quick monitoring for the size requests - how much time is spent on the size requests in total (+ prob. count per sandbox) when starting a sandbox to get better idea about the impact |
Add timer recording to GCS and NFS cache Size() methods to measure how much time is actually spent on Size() calls during sandbox creation. This data will help decide whether to move forward with #1887, which defers Size() calls but introduces a fair amount of complexity in the Size() handling. Metrics produced: - orchestrator.storage.gcs.read with operation=Size - orchestrator.storage.slab.nfs.read with operation=Size (vs ReadAt) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Will be superseeded in the compression feature by storing the upstream build USize in the header |
The idea is that we use the first read to size the cache, rather than explicitly fetching
Size()before. I am not sure that the extra complexities that this PR introduces are worth it.Note that it targets only the new "streaming" chunker, the old one is not changing the logic.