Skip to content

cache lookahead - part 2#674

Open
mzihlmann wants to merge 6 commits into
mainfrom
cache-lookahead-part-2b
Open

cache lookahead - part 2#674
mzihlmann wants to merge 6 commits into
mainfrom
cache-lookahead-part-2b

Conversation

@mzihlmann
Copy link
Copy Markdown
Collaborator

@mzihlmann mzihlmann commented May 3, 2026

Description

Moves the dryrun render output to after the cache lookahead phase. This makes it possible to include cache hit/miss results in the dryrun plan output, and enables golden tests to verify cache lookahead behaviour end-to-end.

Adds the first golden test for cache lookahead (test_issue_mz334).

Submitter Checklist

  • Includes unit tests
  • Adds integration tests if needed

Release Notes

No user-facing changes.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 3, 2026

Codecov Report

❌ Patch coverage is 59.01639% with 25 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pkg/executor/build.go 63.15% 16 Missing and 5 partials ⚠️
pkg/executor/fakes.go 0.00% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

@mzihlmann mzihlmann force-pushed the cache-lookahead-part-2b branch 3 times, most recently from 18ced15 to ceed92c Compare May 3, 2026 14:31
@mzihlmann
Copy link
Copy Markdown
Collaborator Author

Code Review

Overall this is clean and well-motivated. Moving dryrun to after the lookahead phase is the right call — it lets the plan output carry actual cache hit/miss annotations, and the golden test approach gives solid end-to-end coverage for this.


Potential bug: command index misalignment in RenderStages

stageCacheInfo is indexed by position in stageBuilder.cmds, which is built by filtering stage.Commands through GetCommand (nil results are skipped, build.go:152-154). But RenderStages iterates s.Commands (the raw Dockerfile commands) with jdx and reads ci.cacheKeys[jdx], ci.redirectKeys[jdx] etc.

If GetCommand ever returns nil for a command, s.cmds is shorter than s.Commands, the indices diverge, and either the wrong key is shown or an out-of-bounds panic occurs.

In practice GetCommand returning nil probably only happens for edge-case commands, so this won't bite immediately — but it's a structural mismatch worth closing before it grows. The fix is straightforward: align the indexing, e.g. by pre-sizing cacheInfo slices on len(s.Commands) instead of len(s.cmds) and writing at the position in s.Commands rather than the s.cmds iterator.


Unused golden file

golden/testdata/test_issue_mz195/plans/noskip is added in this PR but no test case in test_issue_mz195/test.go references "noskip" as a Plan. Looks like a prepared fixture for a test that didn't land in this PR — either add the test case or drop the file.


Minor: global-var DI for NewLayerCache

Using a package-level var NewLayerCache = newLayerCacheImpl for test injection is pragmatic and works fine here since the golden tests don't call t.Parallel(). Worth a comment if parallelism is ever added, since the swap isn't goroutine-safe.


No concerns with the fakeLayerCacheFakeLayerCache / keySequenceKeySequence export — it's the minimal change needed to share the fake across packages and the t.Cleanup restore is correct.

@mzihlmann mzihlmann force-pushed the cache-lookahead-part-2b branch 2 times, most recently from f83d378 to 0bf55fa Compare May 3, 2026 15:06
@mzihlmann mzihlmann marked this pull request as ready for review May 3, 2026 15:52
@mzihlmann mzihlmann requested review from 0hlov3, BobDu, babs and nejch May 3, 2026 15:52
mzihlmann added 6 commits May 7, 2026 20:46
GetCommand returns nil for deprecated instructions like MAINTAINER, causing
newStageBuilder to produce a shorter s.cmds than stage.Commands. This creates
an index skew: optimize writes stageCacheInfo at s.cmds positions while
RenderStages reads it at stage.Commands positions, so a MAINTAINER before a
cached RUN prints the wrong cache key and panics out-of-bounds on the last
command. Keeping nil entries in s.cmds fixes the alignment; loops that call
methods on commands already guard nil or are updated here to do so.
The lookahead phase calls NewLayerCache inside the per-stage loop, creating a
fresh cache client for every stage. This breaks FakeLayerCache in tests (the
key sequence resets to position 0 for each stage) and wastes a connection in
production. A single client is correct: the lookahead is one logical query.
@mzihlmann mzihlmann force-pushed the cache-lookahead-part-2b branch from 0bf55fa to 3b334f0 Compare May 7, 2026 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant