[image_spec]: add FLYTEKIT_USE_DEPOT env var override and docker fallback on depot auth errors#36
Closed
[image_spec]: add FLYTEKIT_USE_DEPOT env var override and docker fallback on depot auth errors#36
Conversation
…fallback on depot auth errors Co-Authored-By: ryan@exa.ai <ryanjwong007@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
…rve build streaming Co-Authored-By: ryan@exa.ai <ryanjwong007@gmail.com>
Co-Authored-By: ryan@exa.ai <ryanjwong007@gmail.com>
…led (False) in preflight Co-Authored-By: ryan@exa.ai <ryanjwong007@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why are the changes needed?
The
DEPOT_TOKENstored in AWS Secrets Manager expired/became invalid, causing all Flyte image builds to fail withpermission_denied: Invalid token. Sinceuse_depot=Trueis the default onImageSpec, there was no way to bypass depot without code changes, and no fallback mechanism.What changes were proposed in this pull request?
Refactors
DefaultImageBuilder._build_imageto add resilience against depot auth failures:FLYTEKIT_USE_DEPOTenv var — overridesImageSpec.use_depotat runtime. SetFLYTEKIT_USE_DEPOT=falseto skip depot entirely without code changes.depot projects listto verify credentials. If the preflight detects an auth error (or times out), automatically falls back todocker image build(if docker is available and running). The actual build command still usesrun(command, check=True)so stdout/stderr stream in real-time._resolve_use_depot,_build_command,_validate_build_tool,_is_depot_auth_error,_depot_auth_preflightfor testability._depot_auth_preflightreturns a tri-statebool | None:True— auth ok, proceed with depotFalse— auth failed, fall back to dockerNone— depot not installed, fall through to_validate_build_tool(which raises the proper "not installed" error)Updates since last revision
capture_output=Trueapproach with a preflight auth check (_depot_auth_preflight) to avoid buffering build output during long builds. The fallback decision now happens before the build starts, preserving real-time streaming._depot_auth_preflightnow catchessubprocess.TimeoutExpired(15s timeout ondepot projects list) and returnsFalseto trigger docker fallback instead of crashing the build._depot_auth_preflightnow distinguishes "depot not installed" (None) from "auth failed" (False), so missing-depot errors propagate correctly through_validate_build_toolinstead of printing a misleading auth failure message.How was this patch tested?
Inline Python tests verifying:
_resolve_use_depotrespects env var override and falls back toimage_spec.use_depot_is_depot_auth_errormatches expected patterns ("permission_denied", "not authorized", "invalid token", "unauthenticated") and rejects unrelated errors_build_commandgenerates correct depot/docker commands, includes--projectfor nix, skips--pushfor nixNo integration tests for the actual fallback path (requires depot installed with an invalid token).
Human review checklist
_depot_auth_preflightrunsdepot projects list(with 15s timeout) on every build whereuse_depot=True. This adds a network round-trip even when depot auth is fine. Consider whether caching the preflight result or skipping the check would be better.depot projects listfails with a non-auth error (e.g. rate limit, 500),_depot_auth_preflightreturnsTrueand the build proceeds with depot. The build could then fail for the same transient reason.shutil.whichandrun:_validate_build_toolchecksshutil.which("docker")then callsrun(["docker", "info"]). The original code wrapped this in a broadertry/except; the new code letsFileNotFoundErrorpropagate uncaught if the binary disappears between those calls.Check all the applicable boxes
Link to Devin run | Requested by: @ryanjwong