[image-spec]: revert nix-native image builder (#35) and cross-build support (#43)#45
[image-spec]: revert nix-native image builder (#35) and cross-build support (#43)#45jld-adriano wants to merge 2 commits intomasterfrom
Conversation
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
| nix run .#docker.copyTo -- docker://$IMAGE_NAME --dest-creds "AWS:$ECR_TOKEN" \ | ||
| --image-parallel-copies 32 \ | ||
| --dest-creds "AWS:$ECR_TOKEN" |
There was a problem hiding this comment.
🟡 Duplicate --dest-creds flag in NIX_DOCKER_FILE_TEMPLATE
The NIX_DOCKER_FILE_TEMPLATE passes --dest-creds "AWS:$ECR_TOKEN" twice to the nix run .#docker.copyTo command.
Root Cause
At flytekit/image_spec/default_builder.py:197-199, the template contains:
nix run .#docker.copyTo -- docker://$IMAGE_NAME --dest-creds "AWS:$ECR_TOKEN" \
--image-parallel-copies 32 \
--dest-creds "AWS:$ECR_TOKEN"
The --dest-creds argument appears on both line 197 and line 199. Depending on how skopeo (the underlying tool for copyTo) handles duplicate flags, this could cause unexpected behavior or errors. At minimum it's redundant; at worst it could cause authentication failures if the tool doesn't handle duplicate credential flags gracefully.
Impact: Every nix-based image build will pass duplicate credential flags to the copy tool.
| nix run .#docker.copyTo -- docker://$IMAGE_NAME --dest-creds "AWS:$ECR_TOKEN" \ | |
| --image-parallel-copies 32 \ | |
| --dest-creds "AWS:$ECR_TOKEN" | |
| nix run .#docker.copyTo -- docker://$IMAGE_NAME --dest-creds "AWS:$ECR_TOKEN" \ | |
| --image-parallel-copies 32 | |
Was this helpful? React with 👍 or 👎 to provide feedback.
| if image_spec.nix: | ||
| return | ||
| docker_content = NIX_DOCKER_FILE_TEMPLATE.substitute( | ||
| IMAGE_NAME=image_spec.image_name(), | ||
| COPY_LOCAL_PACKAGES=copy_local_packages, | ||
| ECR_TOKEN=subprocess.run( | ||
| ["aws", "ecr", "get-login-password", "--region", "us-west-2"], | ||
| capture_output=True, | ||
| text=True, | ||
| check=True, | ||
| ).stdout.strip(), | ||
| ) |
There was a problem hiding this comment.
🔴 ECR token fetched unconditionally in create_docker_context for nix builds, even when not pushing
create_docker_context always calls aws ecr get-login-password when nix=True, even when the build is local-only (push=False) or has no registry configured.
Root Cause
At flytekit/image_spec/default_builder.py:773-782, the nix branch unconditionally runs:
ECR_TOKEN=subprocess.run(
["aws", "ecr", "get-login-password", "--region", "us-west-2"],
capture_output=True, text=True, check=True,
).stdout.strip(),This is called from _build_image at line 883 via create_docker_context(image_spec, tmp_path) — the push parameter is not passed through. In the previous code (before the revert), the ECR token was only fetched when push and image_spec.registry was true (flytekit/image_spec/default_builder.py:826-829 in the old code).
This means:
- Running
_build_image(spec, push=False)withnix=Truewill fail with aCalledProcessErrorif AWS CLI isn't configured, even though no push is intended. - Running with
nix=Truebut no ECR registry will also unnecessarily fail. - The ECR token is baked into the Dockerfile as plaintext, which is a credential exposure risk (the token is written to disk and embedded in Docker build layers).
Impact: Nix builds are broken for local-only testing (push=False) and for non-ECR registries. Additionally, ECR credentials are exposed in the Dockerfile on disk.
Prompt for agents
In flytekit/image_spec/default_builder.py, the create_docker_context function at line 773-783 unconditionally fetches an ECR token when nix=True. This should be conditional on whether a push is actually needed and whether the registry is ECR. However, create_docker_context does not receive the push parameter. The fix requires either: (1) passing a push parameter to create_docker_context and only fetching the ECR token when push=True and the registry is ECR, or (2) moving the ECR token fetching logic out of create_docker_context and into _build_image where the push parameter is available, and passing it as a parameter to the template substitution. Additionally, consider using Docker build secrets (--secret) instead of embedding the token in the Dockerfile to avoid credential exposure.
Was this helpful? React with 👍 or 👎 to provide feedback.
| try: | ||
| result = run(["docker", "info"], capture_output=True, text=True) | ||
| if result.returncode != 0: | ||
| raise RuntimeError( | ||
| f"Docker daemon is not running or not accessible. Error: {result.stderr}\n" | ||
| "Please start Docker daemon or use depot by setting use_depot=True" | ||
| ) | ||
| except Exception as e: | ||
| raise RuntimeError( | ||
| f"Failed to check Docker daemon status: {str(e)}\n" | ||
| "Please ensure Docker is properly installed and running, or use depot by setting use_depot=True" | ||
| ) |
There was a problem hiding this comment.
🟡 except Exception catches and re-wraps the RuntimeError raised for non-running Docker daemon
When the Docker daemon check fails (returncode != 0), the specific RuntimeError is caught by the broad except Exception handler and re-raised with a less informative message.
Root Cause
At flytekit/image_spec/default_builder.py:868-879:
try:
result = run(["docker", "info"], capture_output=True, text=True)
if result.returncode != 0:
raise RuntimeError(
f"Docker daemon is not running or not accessible. Error: {result.stderr}\n"
"Please start Docker daemon or use depot by setting use_depot=True"
)
except Exception as e:
raise RuntimeError(
f"Failed to check Docker daemon status: {str(e)}\n"
"Please ensure Docker is properly installed and running, or use depot by setting use_depot=True"
)When result.returncode != 0, the RuntimeError at line 871 is raised inside the try block. This is immediately caught by except Exception as e at line 875, and a new RuntimeError is raised wrapping the original. The user sees a confusing double-wrapped message like:
Failed to check Docker daemon status: Docker daemon is not running or not accessible. Error: ...
Please start Docker daemon or use depot by setting use_depot=True
Please ensure Docker is properly installed and running, or use depot by setting use_depot=True
The except block was intended to catch errors from run() itself (e.g., FileNotFoundError), not from the explicit RuntimeError raise.
Impact: Users see a confusing, double-wrapped error message when Docker daemon is not running.
| try: | |
| result = run(["docker", "info"], capture_output=True, text=True) | |
| if result.returncode != 0: | |
| raise RuntimeError( | |
| f"Docker daemon is not running or not accessible. Error: {result.stderr}\n" | |
| "Please start Docker daemon or use depot by setting use_depot=True" | |
| ) | |
| except Exception as e: | |
| raise RuntimeError( | |
| f"Failed to check Docker daemon status: {str(e)}\n" | |
| "Please ensure Docker is properly installed and running, or use depot by setting use_depot=True" | |
| ) | |
| # Check if Docker daemon is running | |
| try: | |
| result = run(["docker", "info"], capture_output=True, text=True) | |
| except Exception as e: | |
| raise RuntimeError( | |
| f"Failed to check Docker daemon status: {str(e)}\n" | |
| "Please ensure Docker is properly installed and running, or use depot by setting use_depot=True" | |
| ) | |
| if result.returncode != 0: | |
| raise RuntimeError( | |
| f"Docker daemon is not running or not accessible. Error: {result.stderr}\n" | |
| "Please start Docker daemon or use depot by setting use_depot=True" | |
| ) |
Was this helpful? React with 👍 or 👎 to provide feedback.
Tracking issue
Reverts #35 and #43.
Why are the changes needed?
Reverting the nix-native image builder changes (#35) and the cross-build support (#43, which depended on #35) per request. This restores the previous depot-based build path for
nix=TrueImageSpec builds.What changes were proposed in this pull request?
Two
git revert -m 1operations in reverse chronological order:This restores:
use_depotdefault back toTrueNIX_DOCKER_FILE_TEMPLATE(Docker-in-Docker approach: installs nix inside Ubuntu container, runsnix run .#docker.copyTo)nix=True(uses--project bf5bv9t2mj)use_depot=Trueas alternativerun(command, check=True)instead of manual returncode + credential redactionHow was this patch tested?
Mechanical revert of two merge commits. Existing unit tests updated to match restored error messages.
use_depotdefault isTrueagain: AnyImageSpecwithout explicituse_depot=Falsewill use depot. Verify this is the desired behavior.run(command, check=True)which can leak credentials inCalledProcessError.cmdon failure. [image-spec]: nix-native image builder, churn depot dependency #35 had added redaction logic that is now removed.--dest-credsinNIX_DOCKER_FILE_TEMPLATE(line ~196) — pre-existing issue restored by this revert.Related PRs
Link to Devin run: https://app.devin.ai/sessions/af4653767caf4f7aa96c81f4244e94a6
Requested by: @jld-adriano