feature: Compression for memfile/rootfs assets by levb · Pull Request #2034 · e2b-dev/infra

levb · 2026-03-02T13:55:29Z

Summary

Compression for data files (memfile, rootfs). Files are broken into independently decompressible frames (2 MiB, zstd), stored in GCS alongside V4 headers with per-mapping frame tables. Fully backward-compatible: the read path auto-detects V3/V4 headers and routes compressed vs uncompressed reads per-mapping. Gated by compress-config LaunchDarkly flag (per-team/cluster/template targeting).

What changed

FramedFile interface replaces Seekable — unified GetFrame(ctx, offset, frameTable, decompress, buf, readSize, onRead) handles both compressed and uncompressed data
V4 header with FrameTable per mapping + BuildFileInfo (uncompressed size, SHA-256 checksum) per build; LZ4-block-compressed header blob
NFS cache extended for compressed frames (.frm files keyed by compressed offset+size); progressive streaming decompression on cache miss; write-through on upload
P2P resume integration — peers read uncompressed from origin during upload, then atomically swap to V4 header (CAS) when origin signals use_storage with serialized headers
compress-build CLI for background compression of existing uncompressed builds (supports --recursive for dependency chains)
New Chunker with mmap cache, and fetch sessions dedupe replacing streaming_chunk.go

Read path

  NBD/UFFD/Prefetch
    → header.GetShiftedMapping(offset) → BuildMap + FrameTable
    → DiffStore.Get(ctx, diff)         → cached Chunker
    → Chunker.GetBlock(offset, len, ft)
        → mmap hit? return reference
        → miss: fetchSession (dedup) → GetFrame
            → NFS hit? decompress from disk → mmap
            → NFS miss? GCS range read → decompress → mmap + NFS write-back

P2P header switchover

  Origin (pause):
    snapshot → register buildID in Redis → serve mmap cache via gRPC
    background: upload compressed data + V4 headers to GCS
    on completion: uploadedBuilds.Set(buildID, serialized V4 headers)
                → peerRegistry.Unregister(buildID)

  Peer (resume, upload in progress):
    GetFrame(ft=nil) → gRPC stream → origin serves from mmap (uncompressed)

  Peer (origin signals use_storage):
    checkPeerAvailability() → transitionHeaders.Store({memH, rootH})
                            → uploaded.Store(true)
    next GetFrame(ft=nil): ft==nil + transitionHeaders != nil
      → return PeerTransitionedError{headers}
      → build.File.swapHeader(): Deserialize(bytes) → CompareAndSwap(old, new)
        first goroutine wins CAS; others see swapped header on retry
      → retry: GetFrame(ft!=nil) → NFS/GCS compressed (mmap mostly warm)

Benchmark results

End-to-end pause/resume

(BenchmarkBaseImage, 50 iterations, local disk):

  ┌──────────────┬─────────┬────────────┐
  │     Mode     │ Latency │ Build time │
  ├──────────────┼─────────┼────────────┤
  │ Uncompressed │ 97 ms   │ 61.0s      │
  ├──────────────┼─────────┼────────────┤
  │ LZ4:0        │ 100 ms  │ 61.4s      │
  ├──────────────┼─────────┼────────────┤
  │ Zstd:1       │ 100 ms  │ 60.9s      │
  ├──────────────┼─────────┼────────────┤
  │ Zstd:2       │ 102 ms  │ 62.4s      │
  ├──────────────┼─────────┼────────────┤
  │ Zstd:3       │ 98 ms   │ 61.7s      │
  └──────────────┴─────────┴────────────┘

Full architecture doc: docs/compression-architecture.md

…nd jakub

…ning - Use header.HugepageSize for uncompressed fetch alignment (semantically correct) - Stream NFS cache hits directly into ReadFrame instead of buffering in memory - Fix timer placement to cover full GetFrame (read + decompression) - Fix onRead callback: nil for compressed inner calls (prevents double-invoke), pass through for uncompressed (bytes are final) - Remove panic recovery from runFetch (never in main) - Remove low-value chunker tests subsumed by ConcurrentStress - Remove 4MB frame configs from benchmarks (targeting 2MB only) - Remove unused readCacheFile function Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ble NFS cache - Remove dead flagsClient chain through chunker/build/template layers (~15 files) - Delete ChunkerConfigFlag (unused after flagsClient removal) - Delete mock_flagsclient_test.go - Simplify GetUploadOptions: remove redundant intOr/strOr fallbacks (flags have defaults) - Add GetCompressionType helper to frame_table.go, deduplicate compression type extraction - Replace [16]byte{} with uuid.Nil and "rootfs.ext4" with storage.RootfsName in inspect-build - Simplify UploadV4Header return pattern - Remove onRead callback from legacy fullFetchChunker (FullFetch should not use progressive reads) - Re-enable NFS cache in template cache.go - Remove all fmt.Printf debug instrumentation from orchestrator Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…sionType threading Add per-build file size and SHA-256 checksum to V4 headers, eliminating the redundant Size() network call when opening upstream data files on the read path. Checksums are computed for free by piggybacking on CompressStream's existing frame iteration. Remove the separate compressionType parameter threaded through getBuild → newStorageDiff → NewChunker; the read path now derives compression state from the per-mapping FrameTable directly. V4 binary format change (not yet deployed): [Metadata] [LZ4: numBuilds, builds(uuid+size+checksum), numMappings, mappings...] V3 path unchanged — falls back to Size() call when size is unknown. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

packages/shared/pkg/storage/header/serialization_test.go

packages/shared/pkg/storage/storage_cache_seekable.go

packages/shared/pkg/storage/storage_fs.go

- Merge writeFrameToCache and writeChunkToCache into unified writeToCache with lock + atomic rename, used by all three cache write paths - Fix file descriptor leak in cache hit paths: defer f.Close() and wrap in NopCloser so ReadFrame's close doesn't double-close the fd - Add defer uploader.Close() in CompressStream so PartUploader file handles are released on error paths between Start() and Complete() - Make Close() idempotent via sync.Once on fsPartUploader and filePartWriter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… lev-compression-final

docs/compression-architecture.md

levb · 2026-03-02T14:00:23Z

packages/orchestrator/cmd/benchmark-compress/main.go

@@ -0,0 +1,474 @@
+package main


2/5 may not be very useful once we merge the PR; remove after a final run?

packages/orchestrator/cmd/copy-build/main.go

packages/orchestrator/cmd/inspect-build/main.go

packages/orchestrator/cmd/internal/cmdutil/cmdutil.go

levb · 2026-03-03T13:43:03Z

packages/shared/pkg/storage/storage_aws.go

-func (o *awsObject) StoreFile(ctx context.Context, path string) error {
+func (o *awsObject) StoreFile(ctx context.Context, path string, opts *FramedUploadOptions) (*FrameTable, [32]byte, error) {
+	if opts != nil && opts.CompressionType != CompressionNone {
+		return nil, [32]byte{}, fmt.Errorf("compressed uploads are not supported on AWS (builds target GCP only)")


packages/shared/pkg/storage/storage_cache_seekable.go

packages/shared/pkg/storage/storage_fs.go

The SHA-256 checksum in BuildFileInfo now covers uncompressed data, making it useful for end-to-end integrity verification of the original content. Updated inspect-build to use SHA-256 (replacing MD5) and verify checksums against the header. Fixed early-return lint warnings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

GetUploadOptions now accepts fileType and useCase parameters, enriching the LD evaluation context so dashboard targeting rules can differentiate (e.g. compress memfile but not rootfs, or builds but not pauses). TemplateBuild accepts per-file opts directly instead of holding an ff reference. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…final

Keep the original filename so GitHub detects the rename from main. Restore and adapt all tests from the old seekable_test.go for the FramedFile interface (GetFrame replaces ReadAt/OpenRangeReader). Add new tests for compression-specific behavior: - UseStorage response stores transition headers - TransitionHeaders triggers PeerTransitionedError on fallback - Non-nil FrameTable bypasses transition check - Uploaded flag skips peer entirely - OnRead callback and partial stream error handling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Rename gRPC RPC and message types to match current FramedFile API - Restore uploadedBuildsTTL constant (1h) in server/main.go - Rename peerUseStorageResponse → buildUploadedResponse - Restore cache.go to match main's structure (only nil-guard changes) - Restore sandboxes.go to match main's structure (snapshotResult, uploadSnapshotAsync, PeerToPeerAsyncCheckpointFlag, inline prefetch) with minimal compression-only changes (TemplateBuild.UploadAtOnce, V4 header serialization in completeUpload) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tion Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add BUILD INFO section showing V4 per-build file sizes and SHA-256 checksums - Fix validateCompressedFrames to read each build's own header for complete frame tables (child headers omit frames for overwritten parent blocks) - Hide per-mapping listing by default, add -mappings flag to show it - Revert cmdutil multi-artifact changes (CompressedFiles, allCompressionTypes) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…final

levb · 2026-03-04T14:57:36Z

packages/orchestrator/internal/sandbox/block/cache.go

-		_, dirty := c.dirty.Load(off + blockOff)
-		if !dirty {
-			return false
+// isBlockCached reports whether a single block is marked as cached.


0/5 separate PR - we should move to block indices for block operations (vs current offsets) everywhere, consistently - more intention-revealing, type-able, no overhead (shift to offset)

packages/orchestrator/cmd/copy-build/main.go

packages/shared/pkg/storage/compressed_upload.go

levb · 2026-03-05T00:25:51Z

packages/orchestrator/internal/sandbox/block/chunk_bench_test.go

+
+// --- BenchmarkColdConcurrent ------------------------------------------------
+
+func BenchmarkColdConcurrent(b *testing.B) {


❯ go test -bench=. -benchtime=20x -timeout=30m -bench=BenchmarkCold ./packages/orchestrator/internal/sandbox/block/ -run=^$ -v | tee /tmp/ttt goos: linux goarch: amd64 pkg: github.com/e2b-dev/infra/packages/orchestrator/internal/sandbox/block cpu: AMD Ryzen 7 8845HS w/ Radeon 780M Graphics BenchmarkColdConcurrent BenchmarkColdConcurrent/GCS BenchmarkColdConcurrent/GCS/no-frame BenchmarkColdConcurrent/GCS/no-frame/block=4KB BenchmarkColdConcurrent/GCS/no-frame/block=4KB/Legacy BenchmarkColdConcurrent/GCS/no-frame/block=4KB/Legacy-16 20 776428273 ns/op 100.0 C-MB/op 100.0 U-MB/op 128.8 U-MB/s 25.00 fetches/op BenchmarkColdConcurrent/GCS/no-frame/block=4KB/Uncompressed BenchmarkColdConcurrent/GCS/no-frame/block=4KB/Uncompressed-16 20 643719830 ns/op 100.0 C-MB/op 100.0 U-MB/op 155.3 U-MB/s 25.00 fetches/op BenchmarkColdConcurrent/GCS/no-frame/block=2MB BenchmarkColdConcurrent/GCS/no-frame/block=2MB/Legacy BenchmarkColdConcurrent/GCS/no-frame/block=2MB/Legacy-16 20 771524052 ns/op 100.0 C-MB/op 100.0 U-MB/op 129.6 U-MB/s 25.00 fetches/op BenchmarkColdConcurrent/GCS/no-frame/block=2MB/Uncompressed BenchmarkColdConcurrent/GCS/no-frame/block=2MB/Uncompressed-16 20 661543643 ns/op 100.0 C-MB/op 100.0 U-MB/op 151.2 U-MB/s 25.00 fetches/op BenchmarkColdConcurrent/GCS/LZ4/2MB BenchmarkColdConcurrent/GCS/LZ4/2MB/block=4KB BenchmarkColdConcurrent/GCS/LZ4/2MB/block=4KB-16 20 990016727 ns/op 52.71 C-MB/op 100.0 U-MB/op 101.0 U-MB/s 50.00 fetches/op BenchmarkColdConcurrent/GCS/LZ4/2MB/block=2MB BenchmarkColdConcurrent/GCS/LZ4/2MB/block=2MB-16 20 863048987 ns/op 52.71 C-MB/op 100.0 U-MB/op 115.9 U-MB/s 50.00 fetches/op BenchmarkColdConcurrent/GCS/Zstd1/2MB BenchmarkColdConcurrent/GCS/Zstd1/2MB/block=4KB BenchmarkColdConcurrent/GCS/Zstd1/2MB/block=4KB-16 20 1074220485 ns/op 35.56 C-MB/op 100.0 U-MB/op 93.09 U-MB/s 50.00 fetches/op BenchmarkColdConcurrent/GCS/Zstd1/2MB/block=2MB BenchmarkColdConcurrent/GCS/Zstd1/2MB/block=2MB-16 20 1030327540 ns/op 35.56 C-MB/op 100.0 U-MB/op 97.06 U-MB/s 50.00 fetches/op BenchmarkColdConcurrent/GCS/Zstd2/2MB BenchmarkColdConcurrent/GCS/Zstd2/2MB/block=4KB BenchmarkColdConcurrent/GCS/Zstd2/2MB/block=4KB-16 20 982379629 ns/op 27.94 C-MB/op 100.0 U-MB/op 101.8 U-MB/s 50.00 fetches/op BenchmarkColdConcurrent/GCS/Zstd2/2MB/block=2MB BenchmarkColdConcurrent/GCS/Zstd2/2MB/block=2MB-16 20 900656884 ns/op 27.94 C-MB/op 100.0 U-MB/op 111.0 U-MB/s 50.00 fetches/op BenchmarkColdConcurrent/GCS/Zstd3/2MB BenchmarkColdConcurrent/GCS/Zstd3/2MB/block=4KB BenchmarkColdConcurrent/GCS/Zstd3/2MB/block=4KB-16 20 995036243 ns/op 29.95 C-MB/op 100.0 U-MB/op 100.5 U-MB/s 50.00 fetches/op BenchmarkColdConcurrent/GCS/Zstd3/2MB/block=2MB BenchmarkColdConcurrent/GCS/Zstd3/2MB/block=2MB-16 20 918356378 ns/op 29.95 C-MB/op 100.0 U-MB/op 108.9 U-MB/s 50.00 fetches/op BenchmarkColdConcurrent/NFS BenchmarkColdConcurrent/NFS/no-frame BenchmarkColdConcurrent/NFS/no-frame/block=4KB BenchmarkColdConcurrent/NFS/no-frame/block=4KB/Legacy BenchmarkColdConcurrent/NFS/no-frame/block=4KB/Legacy-16 20 114663078 ns/op 100.0 C-MB/op 100.0 U-MB/op 872.1 U-MB/s 25.00 fetches/op BenchmarkColdConcurrent/NFS/no-frame/block=4KB/Uncompressed BenchmarkColdConcurrent/NFS/no-frame/block=4KB/Uncompressed-16 20 114954611 ns/op 100.0 C-MB/op 100.0 U-MB/op 869.9 U-MB/s 25.00 fetches/op BenchmarkColdConcurrent/NFS/no-frame/block=2MB BenchmarkColdConcurrent/NFS/no-frame/block=2MB/Legacy BenchmarkColdConcurrent/NFS/no-frame/block=2MB/Legacy-16 20 110732740 ns/op 100.0 C-MB/op 100.0 U-MB/op 903.1 U-MB/s 25.00 fetches/op BenchmarkColdConcurrent/NFS/no-frame/block=2MB/Uncompressed BenchmarkColdConcurrent/NFS/no-frame/block=2MB/Uncompressed-16 20 88813164 ns/op 100.0 C-MB/op 100.0 U-MB/op 1126 U-MB/s 25.00 fetches/op BenchmarkColdConcurrent/NFS/LZ4/2MB BenchmarkColdConcurrent/NFS/LZ4/2MB/block=4KB BenchmarkColdConcurrent/NFS/LZ4/2MB/block=4KB-16 20 100734427 ns/op 52.71 C-MB/op 100.0 U-MB/op 992.7 U-MB/s 50.00 fetches/op BenchmarkColdConcurrent/NFS/LZ4/2MB/block=2MB BenchmarkColdConcurrent/NFS/LZ4/2MB/block=2MB-16 20 92445820 ns/op 52.71 C-MB/op 100.0 U-MB/op 1082 U-MB/s 50.00 fetches/op BenchmarkColdConcurrent/NFS/Zstd1/2MB BenchmarkColdConcurrent/NFS/Zstd1/2MB/block=4KB BenchmarkColdConcurrent/NFS/Zstd1/2MB/block=4KB-16 20 101615345 ns/op 35.56 C-MB/op 100.0 U-MB/op 984.1 U-MB/s 50.00 fetches/op BenchmarkColdConcurrent/NFS/Zstd1/2MB/block=2MB BenchmarkColdConcurrent/NFS/Zstd1/2MB/block=2MB-16 20 135799172 ns/op 35.56 C-MB/op 100.0 U-MB/op 736.4 U-MB/s 50.00 fetches/op BenchmarkColdConcurrent/NFS/Zstd2/2MB BenchmarkColdConcurrent/NFS/Zstd2/2MB/block=4KB BenchmarkColdConcurrent/NFS/Zstd2/2MB/block=4KB-16 20 90600690 ns/op 27.94 C-MB/op 100.0 U-MB/op 1104 U-MB/s 50.00 fetches/op BenchmarkColdConcurrent/NFS/Zstd2/2MB/block=2MB BenchmarkColdConcurrent/NFS/Zstd2/2MB/block=2MB-16 20 124130836 ns/op 27.94 C-MB/op 100.0 U-MB/op 805.6 U-MB/s 50.00 fetches/op BenchmarkColdConcurrent/NFS/Zstd3/2MB BenchmarkColdConcurrent/NFS/Zstd3/2MB/block=4KB BenchmarkColdConcurrent/NFS/Zstd3/2MB/block=4KB-16 20 87162737 ns/op 29.95 C-MB/op 100.0 U-MB/op 1147 U-MB/s 50.00 fetches/op BenchmarkColdConcurrent/NFS/Zstd3/2MB/block=2MB BenchmarkColdConcurrent/NFS/Zstd3/2MB/block=2MB-16 20 117404483 ns/op 29.95 C-MB/op 100.0 U-MB/op 851.8 U-MB/s 50.00 fetches/op PASS ok github.com/e2b-dev/infra/packages/orchestrator/internal/sandbox/block 259.978s

levb · 2026-03-05T00:28:58Z

packages/orchestrator/internal/sandbox/block/chunk_bench_test.go

+
+// --- BenchmarkCacheHit ------------------------------------------------------
+
+func BenchmarkCacheHit(b *testing.B) {


(cache hits don't depend on compression, this is just new version against the old version)

❯ go test -bench=. -timeout=30m -bench=BenchmarkCacheHit ./packages/orchestrator/internal/sandbox/block/ -run=^$ -v | tee /tmp/ttt goos: linux goarch: amd64 pkg: github.com/e2b-dev/infra/packages/orchestrator/internal/sandbox/block cpu: AMD Ryzen 7 8845HS w/ Radeon 780M Graphics BenchmarkCacheHit BenchmarkCacheHit/block=4KB BenchmarkCacheHit/block=4KB/Legacy BenchmarkCacheHit/block=4KB/Legacy-16 4521610 261.2 ns/op BenchmarkCacheHit/block=4KB/Uncompressed BenchmarkCacheHit/block=4KB/Uncompressed-16 9395292 128.6 ns/op BenchmarkCacheHit/block=2MB BenchmarkCacheHit/block=2MB/Legacy BenchmarkCacheHit/block=2MB/Legacy-16 4529709 269.6 ns/op BenchmarkCacheHit/block=2MB/Uncompressed BenchmarkCacheHit/block=2MB/Uncompressed-16 9440967 128.4 ns/op PASS ok github.com/e2b-dev/infra/packages/orchestrator/internal/sandbox/block 11.035s

…ssed uploads Replace AtomicImmutableFile-based progressive NFS writes in fetchAndDecompressProgressive with a single cacheFrameAsync call after the fetch goroutine completes. This removes lock.OpenFile, progressive NFS streaming, and complex atomic file lifecycle management while keeping the io.Pipe for overlapping GCS fetch with decompression. Add write-through NFS caching for compressed uploads in storeFileCompressed, gated by EnableWriteThroughCacheFlag, using OnFrameReady to async-write each compressed frame via writeToCache. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Rename peerFramedFile source files from seekable.go to framed.go to match the type name. Enable per-block CRC checksums on the LZ4 encoder via BlockChecksumOption(true) for corruption detection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fsPartUploader was a near-copy of MemPartUploader that wrote directly to a file handle, unsafe with CompressStream's concurrent part uploads. Embed MemPartUploader and write the assembled result atomically on Complete. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…4Header Simplify benchmark to single-mode driven by BENCH_COMPRESS env var (e.g. "zstd:2", "lz4:0", or empty for uncompressed) instead of running multiple sub-benchmarks in one process. Add bench.sh to run each mode in its own process for isolation. Clone headers before mutation in UploadV4Header to prevent concurrent map read/write between upload goroutines and UFFD handlers reading the same header from the template cache. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…final

Add section G with mermaid diagrams for the four most complex code paths: P2P header switchover (full 5-phase sequence diagram), compressed frame progressive fetch pipeline, NFS cache GetFrame routing, and upload completion signaling. Fix filename (framedfile.go → framed.go), DiffStore.Get signature, renumber metrics section to H. Also include terraform lock update for google-beta provider 6.50.0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Convert 6 mermaid diagrams (flowcharts + sequence diagrams) and 1 ASCII timeline to a single consistent format: indented → chains with ├─/└─ branching and labeled phases. Also replace regionLock references with fetchSession, convert box-drawing Header States table to markdown. 713 → 598 lines (-16%). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…che-hit) AMD Ryzen 7 8845HS, 16 threads. Key changes from previous numbers: - Legacy NFS throughput higher than before (907-957 vs 555-578 MB/s) - Zstd1 NFS 2MB reads at 750 MB/s (decoder overhead on large blocks) - Cache-hit: 132/130 ns (new) vs 276/269 ns (legacy) = 2.1x - Updated weighted throughput and recommendation analysis Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace sequential throttledReader (time.Sleep per Read call) with pipelined io.Pipe simulation so decoder runs concurrently with simulated transfer — matching real network I/O behavior. The old approach penalized zstd due to time.Sleep OS scheduling overhead accumulating across many internal decoder Read calls. Also add zstd.EncoderLevel mapping comments to both benchmark files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…config fields Replace the 3-goroutine pipeline (reader → worker pool → reorder collector) with a single-loop batch-parallel design: read a batch of frames, compress in parallel via errgroup, emit in order, upload the part asynchronously. Eliminates channels, reorder map, and inter-stage goroutines. Rename struct fields and FF JSON keys for clarity: - Level → CompressionLevel - EncodeWorkers → FrameEncodeWorkers - TargetPartSize (bytes) → FramesPerUploadPart (frame count) - JSON: "level" → "compressionLevel", "encodeWorkers" → "frameEncodeWorkers", "uploadPartTargetMB" → "framesPerUploadPart" Other changes: - LZ4 default compression level 3 → 0 (fast mode) - Wire LZ4 encoder concurrency through newLZ4Encoder - Add CompressStream tests (round-trip, cancel, part count, race) - Add BenchmarkStoreFile (codec × worker matrix, 1 GB, FS-backed) - Add write-path benchmark results and analysis to compression doc Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace lz4.Writer/lz4.Reader with lz4.CompressBlock/UncompressBlock to eliminate frame overhead (headers, checksums, streaming decoder machinery) for latency-sensitive pause/resume paths. Key changes: - Encode: lz4FrameCompressor uses CompressBlock with CompressBlockBound-sized dst - Decode: unified DecompressLZ4(src, dst) function, callers verify exact size - Consolidate decompress code into decompress.go (delete decoders.go, lz4.go) - Unify ReadFrame fetch path: single rangeRead call, codec-specific decompress - Unified readInto helper for progressive and single-shot reads - Restrict progressive pipe to zstd only in cache layer (LZ4 is all-at-once) - Remove unused HC compression path (level always 0) - Fix paralleltest lint issues in compressed_upload_test.go BenchmarkStoreFile/lz4 (1GB, streaming → block, 3x): w1: 231 → 243 MB/s (+5%) w2: 388 → 411 MB/s (+6%) w4: 602 → 639 MB/s (+6%) w8: 740 → 753 MB/s (+2%) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

djeebus · 2026-03-06T19:30:40Z

packages/shared/pkg/storage/storage.go

-	OpenBlob(ctx context.Context, path string, objectType ObjectType) (Blob, error)
-	OpenSeekable(ctx context.Context, path string, seekableObjectType SeekableObjectType) (Seekable, error)


The second parameter here was only included to provide compile-time safety for reading a file with the wrong method. For example, calling OpenSeekable(ctx, "file", RootFSHeaderObjectType) would fail to compile. We had a few bugs where we were calling the wrong method, so the data was never cached. Is this no longer possible or useful?

djeebus · 2026-03-06T19:40:49Z

packages/shared/pkg/storage/storage.go

+
+// minProgressiveReadSize is the floor for progressive reads to avoid
+// tiny I/O when the caller's block size is small (e.g. 4 KB rootfs).
+const minProgressiveReadSize = 256 * 1024 // 256 KB


Maybe make this a feature flag, so we can tune it in production?

Ok, I actually dropped it accidentally, it's in chunker-config now

djeebus · 2026-03-06T19:42:27Z

packages/shared/pkg/storage/storage.go

+		if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) {
+			break
+		}
+
+		if err != nil {
+			return Range{}, fmt.Errorf("progressive read error after %d bytes: %w", total, err)
+		}
+	}
+
+	return Range{Start: rangeStart, Length: int(total)}, nil


I'm slowly coming to the understanding that returning io.EOF when reaching the end of a file is useful, and avoids an extra read that'll never work. Might be useful here, and have callers take advantage of it.

djeebus · 2026-03-06T20:11:31Z

packages/shared/pkg/storage/storage.go

+// Each backend (GCP, AWS, FS) calls this with their own rangeRead callback.
+// Exported for use by CLI tools (inspect-build, compress-build) and tests that
+// need to read frames outside the normal StorageProvider stack.
+func ReadFrame(ctx context.Context, rangeRead RangeReadFunc, storageDetails string, offsetU int64, frameTable *FrameTable, decompress bool, buf []byte, readSize int64, onRead func(totalWritten int64)) (Range, error) {


This seems like an easy place to have AI add some tests verifying behavior.

levb and others added 18 commits February 27, 2026 05:52

initial state merged from compression-v3, pre-discussion with tomas a…

9b6ec99

…nd jakub

restored iac

31f12f3

restored .github

fe2360f

Merge branch 'main' into lev-compression-final

e1d4430

reduce diff, 1

cab697f

reduce diff, 2

15a3184

reduce diff, 3

700bc50

reduce diff, 4

75b555c

reduce diff, 5

004d7b4

reduce diff, 6 + lint

542abdf

reduce diff, 7

4ef3742

reduce diff, comments

9cee311

reduce diff, comments +1

5fe8ab0

reduce diff, more

3910a56

more adjustments

b9f3e41

e2b-request-same-site-reviewers bot assigned matthewlouisbrockman Mar 2, 2026

levb mentioned this pull request Mar 2, 2026

WIP: Compression, try 2 #1955

Closed

claude bot reviewed Mar 2, 2026

View reviewed changes

levb and others added 6 commits March 2, 2026 10:58

lint

a71d8ed

chore: auto-commit generated changes

1a76650

restored 4MB fetches for uncompressed

221d815

Merge branch 'lev-compression-final' of github.com:e2b-dev/infra into…

163fa36

… lev-compression-final

lint

68291f3

levb commented Mar 3, 2026

View reviewed changes

levb and others added 2 commits March 3, 2026 06:09

levb and others added 8 commits March 4, 2026 12:22

reduce diff

34b61c0

Merge branch 'main' of github.com:e2b-dev/infra into lev-compression-…

a954ffd

…final

moved Size around to reduce the diff

77671d6

Restore main.go init order to match main, keep only InitDecoders addi…

0feaf54

…tion Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge branch 'main' of github.com:e2b-dev/infra into lev-compression-…

29ed016

…final

levb commented Mar 5, 2026

View reviewed changes

levb and others added 7 commits March 4, 2026 17:01

lint

209cad3

Merge branch 'main' of github.com:e2b-dev/infra into lev-compression-…

2cad5d8

…final

levb marked this pull request as ready for review March 5, 2026 18:36

levb requested review from ValentaTomas, dobrac and jakubno as code owners March 5, 2026 18:36

levb requested a review from djeebus March 5, 2026 18:36

levb and others added 5 commits March 5, 2026 10:58

djeebus reviewed Mar 6, 2026

View reviewed changes


		// --- BenchmarkColdConcurrent ------------------------------------------------

		func BenchmarkColdConcurrent(b *testing.B) {


		// --- BenchmarkCacheHit ------------------------------------------------------

		func BenchmarkCacheHit(b *testing.B) {

		OpenBlob(ctx context.Context, path string, objectType ObjectType) (Blob, error)
		OpenSeekable(ctx context.Context, path string, seekableObjectType SeekableObjectType) (Seekable, error)

Conversation

levb commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Read path

P2P header switchover

Benchmark results

End-to-end pause/resume

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

levb commented Mar 2, 2026 •

edited

Loading