Skip to content

WIP: An attempt delay fetching build's size for MMap until 1st read#1887

Closed
levb wants to merge 14 commits intomainfrom
lev-get-size-lazily
Closed

WIP: An attempt delay fetching build's size for MMap until 1st read#1887
levb wants to merge 14 commits intomainfrom
lev-get-size-lazily

Conversation

@levb
Copy link
Contributor

@levb levb commented Feb 10, 2026

The idea is that we use the first read to size the cache, rather than explicitly fetching Size() before. I am not sure that the extra complexities that this PR introduces are worth it.

Note that it targets only the new "streaming" chunker, the old one is not changing the logic.

levb and others added 10 commits February 9, 2026 06:24
Replace the bulk Chunker with a StreamingChunker that reads data
progressively from upstream (GCS/NFS), notifying blocked callers as
soon as their specific byte range arrives instead of waiting for the
entire 4MB chunk. Benchmarks show ~2.3x lower average caller latency
under random access patterns (~4.5ms vs ~10.2ms simulated GCS).

- Add StreamingReader interface (OpenRangeReader) to storage package,
  implemented on all backends (GCS, FS, AWS, NFS cache)
- Add read-through caching on cachedSeekable.OpenRangeReader
- Embed StreamingReader in Seekable interface
- Switch StorageDiff from Chunker to StreamingChunker

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Revert storage_diff.go to use the existing Chunker so the
StreamingChunker is not yet on any production code path.

StreamingChunker improvements:
- Sort waiters by endByte for O(satisfied) notification instead of O(all)
- Add 60s fetch timeout and thread context through runFetch/progressiveRead
- Add panic recovery in runFetch so waiters never hang on panics
- Add cancelOnCloseReader to GCS OpenRangeReader for proper context cleanup
- Add nolint:containedctx to cacheWriteThroughReader.ctx
- Add TestStreamingChunker_PanicRecovery test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous benchmark used a zero-latency upstream and reported
analytically computed latency predictions. Replace with a
bandwidth-limited upstream (real time.Sleep) tuned to observed
production latencies:
  GCS: 20ms TTFB + 100 MB/s → ~62ms per 4MB chunk (observed ~60ms)
  NFS:  1ms TTFB + 500 MB/s → ~9ms  per 4MB chunk (observed ~9-10ms)

Results confirm the analytical predictions and additionally surface
a ~5ms constant overhead in the old Chunker's post-transfer path
(batch setIsCached + WaitMap signaling) that the analytical model
could not capture.

Also reduces concurrent callers from 20 to 3 to match realistic
UFFD/NBD concurrency (1-2 vCPU faults + 1 NBD request), and
pre-generates a shared offset sequence across all sub-benchmarks
for direct comparability.

Results (300 iterations × 3 counts, 3 concurrent callers):

  GCS/StreamingChunker  42790 avg-us/caller  63260 worst-us/caller
  GCS/StreamingChunker  42776 avg-us/caller  63221 worst-us/caller
  GCS/StreamingChunker  42811 avg-us/caller  63400 worst-us/caller
  GCS/Chunker           66204 avg-us/caller  68199 worst-us/caller
  GCS/Chunker           66351 avg-us/caller  68814 worst-us/caller
  GCS/Chunker           66261 avg-us/caller  67971 worst-us/caller
  NFS/StreamingChunker   6038 avg-us/caller  10387 worst-us/caller
  NFS/StreamingChunker   6020 avg-us/caller  10392 worst-us/caller
  NFS/StreamingChunker   6041 avg-us/caller  10434 worst-us/caller
  NFS/Chunker           13812 avg-us/caller  15299 worst-us/caller
  NFS/Chunker           13839 avg-us/caller  15880 worst-us/caller
  NFS/Chunker           13773 avg-us/caller  15514 worst-us/caller

  Backend  Chunker             Avg     Worst   Avg speedup
  GCS      StreamingChunker    42.8ms  63.3ms  1.55x
  GCS      Chunker (baseline)  66.3ms  68.3ms  —
  NFS      StreamingChunker     6.0ms  10.4ms  2.3x
  NFS      Chunker (baseline)  13.8ms  15.5ms  —

The ~5ms worst-case gap (constant across both backends) is fixed
overhead in the old Chunker: burst of 1024 sync.Map.Store calls
after ReadAt + WaitMap channel signaling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire StreamingChunker into the single production call site
(StorageDiff.Init). All UFFD page faults, NBD reads, and
prefetch operations now use progressive chunk fetching.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove ReadAt fallback in cachedSeekable.OpenRangeReader: Seekable
  embeds StreamingReader, so the type assertion was dead code
- Reuse buffer in StreamingChunker.WriteTo instead of allocating 4MB
  per iteration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…arison

Logs every 100th Slice() call (hit/miss) and all failures.
Temporary — will be removed after cluster benchmarking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Benchmarking complete — StreamingChunker shows ~20% p50 improvement
on sandbox Create times vs baseline Chunker.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract Chunker interface from the concrete FullFetchChunker (renamed from
Chunker) so StorageDiff can switch implementations at runtime. When the
use-streaming-chunker flag is enabled, StreamingChunker is used; otherwise
the original FullFetchChunker is used (safe default).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
mu sync.Mutex
chunkOff int64
chunkLen int64
blockSize int64
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is suspicious but not changed in this PR. The block size of a chunker for buildId A seems determined by the block size of the first top-level build file (B) that referred to it in its mappings. This seems ok since the block size is used only for tracking isCached, etc.

levb and others added 2 commits February 10, 2026 14:58
…-faster

# Conflicts:
#	packages/shared/pkg/feature-flags/flags.go
…m builds

When StorageDiff.Init() creates a StreamingChunker for upstream builds,
it no longer calls obj.Size() (GCS Attrs / S3 HeadObject). Instead, the
object size is discovered for free from the first range-read response
(Content-Range header / GCS reader.Attrs.Size) and the mmap-backed
cache is created lazily on first fetch.

Storage layer changes:
- GCS: cache total size from reader.Attrs.Size in ReadAt/OpenRangeReader
- AWS: parse Content-Range header in ReadAt/OpenRangeReader
- Both: Size() returns cached value first, falls back to metadata call

StreamingChunker changes:
- Constructor takes sizeFunc instead of pre-resolved size
- Cache created on first runFetch via sync.Once
- atomic.Pointer[Cache] and atomic.Int64 for size make every concurrent
  read self-documenting via .Load() — no separate init gate needed
- fetchSession uses back-reference to chunker instead of duplicating
  cache/blockSize fields
- isCached fast-path in registerAndWait moved before s.mu.Lock (atomic
  load + sync.Map are already thread-safe)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@levb levb force-pushed the lev-get-size-lazily branch from ae3827b to 83ab9f5 Compare February 10, 2026 23:08
@levb levb marked this pull request as ready for review February 10, 2026 23:25
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 83ab9f567b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +232 to +234
size := c.size.Load()

for i := int64(0); i < c.size; i += storage.MemoryChunkSize {
for i := int64(0); i < size; i += storage.MemoryChunkSize {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Initialize chunker before streaming full contents

With lazy initialization, c.size remains 0 until a range fetch runs, but WriteTo now reads size := c.size.Load() and immediately returns (0, nil) when called first. In streaming mode StorageDiff.Init no longer fetches size up front, so a first operation that exports via WriteTo can silently emit an empty output instead of the object bytes.

Useful? React with 👍 / 👎.

}

b, cacheErr := c.cache.Slice(off, length)
b, cacheErr := c.cache.Load().Slice(off, length)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard cache dereference when no fetch was needed

Because cache creation is deferred, c.cache.Load() can still be nil on the first call; if Slice is invoked with length == 0 (including ReadAt with an empty buffer), all fetch goroutines short-circuit with clippedLen <= 0, eg.Wait() succeeds, and this line dereferences a nil cache pointer, causing a panic. Previously the cache always existed, so zero-length reads were safe.

Useful? React with 👍 / 👎.

@dobrac
Copy link
Contributor

dobrac commented Feb 10, 2026

lets add first quick monitoring for the size requests - how much time is spent on the size requests in total (+ prob. count per sandbox) when starting a sandbox to get better idea about the impact

levb added a commit that referenced this pull request Feb 11, 2026
Add timer recording to GCS and NFS cache Size() methods to measure how
much time is actually spent on Size() calls during sandbox creation.
This data will help decide whether to move forward with
#1887, which defers Size() calls
but introduces a fair amount of complexity in the Size() handling.

Metrics produced:
- orchestrator.storage.gcs.read with operation=Size
- orchestrator.storage.slab.nfs.read with operation=Size (vs ReadAt)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@levb
Copy link
Contributor Author

levb commented Feb 11, 2026

@dobrac #1891

Base automatically changed from lev-slice-return-faster to main February 18, 2026 19:57
@dobrac dobrac marked this pull request as draft February 21, 2026 01:33
@levb
Copy link
Contributor Author

levb commented Mar 2, 2026

Will be superseeded in the compression feature by storing the upstream build USize in the header

@levb levb closed this Mar 2, 2026
@levb levb deleted the lev-get-size-lazily branch March 2, 2026 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants