Skip to content

[flytekit]: fix pod_template serialization for dynamic task node overrides#38

Open
Vervious wants to merge 1 commit intomasterfrom
devin/1771467648-fix-pod-template-serialization
Open

[flytekit]: fix pod_template serialization for dynamic task node overrides#38
Vervious wants to merge 1 commit intomasterfrom
devin/1771467648-fix-pod-template-serialization

Conversation

@Vervious
Copy link

Why are the changes needed?

When a dynamic task uses with_overrides(pod_template=...), the pod spec in the TaskNodeOverrides was only serialized when should_fast_serialize() returned True. During dynamic task compilation at runtime, should_fast_serialize() is typically False, so override_pod_spec remained {} (empty dict). However, the TaskNodeOverrides still constructed a PodTemplate(pod_spec={}, ...) because entity._pod_template is checked separately downstream. This resulted in the propeller receiving a node with an empty pod spec override, causing the child task to stay in UNKNOWN status.

This was observed in execution byc-812vfva9fvq7 where a post-training dynamic task's child showed UNKNOWN status indefinitely.

The fork had diverged from upstream flytekit here — upstream always calls _serialize_pod_spec when a pod_template exists, and only gates set_command_fn behind should_fast_serialize().

What changes were proposed in this pull request?

Moved _serialize_pod_spec outside the should_fast_serialize() gate so it's called whenever entity._pod_template is not None, matching upstream behavior. The set_command_fn call (which rewrites the entrypoint for fast registration) remains gated behind should_fast_serialize() with the existing ContainerTask guard.

Before:

if entity._pod_template is not None and settings.should_fast_serialize():
    if not isinstance(entity.flyte_entity, ContainerTask):
        entity.flyte_entity.set_command_fn(...)
    override_pod_spec = _serialize_pod_spec(...)

After:

if entity._pod_template is not None:
    if settings.should_fast_serialize() and not isinstance(entity.flyte_entity, ContainerTask):
        entity.flyte_entity.set_command_fn(...)
    override_pod_spec = _serialize_pod_spec(...)

How was this patch tested?

  • Existing translator unit tests pass (7/7).
  • Manual code review confirming alignment with upstream flytekit.

Human review checklist

  • Verify that _serialize_pod_spec + _get_container produce valid output when set_command_fn has NOT been called (non-fast-serialize path). This is the main behavioral change.
  • Confirm that the original fork divergence (adding the ContainerTask isinstance check) is preserved in the new structure.
  • Consider whether a regression test for pod_template serialization without fast_serialize is warranted.

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Link to Devin run: https://app.devin.ai/sessions/2d5b0a201d6d472db332fd9f20485019
Requested by: @Vervious

@devin-ai-integration
Copy link

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

…fast_serialize

The pod_spec in TaskNodeOverrides was only serialized when
should_fast_serialize() was True. This meant dynamic tasks with
pod_template overrides (via with_overrides) would produce an empty
pod_spec in the DynamicJobSpec when fast_serialize was disabled,
causing the propeller to fail to schedule the child task (UNKNOWN status).

Aligns with upstream flytekit behavior: always serialize pod_spec when
a pod_template override exists; only gate set_command_fn behind
should_fast_serialize().

Co-Authored-By: benchan@exa.ai <ben@vervious.com>
@devin-ai-integration devin-ai-integration bot force-pushed the devin/1771467648-fix-pod-template-serialization branch from 91f78e0 to 793ca45 Compare February 21, 2026 02:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants