Fix alias output naming and improve Spike I/O diagnostics by hgt312 · Pull Request #51 · aws-neuron/nkipy

hgt312 · 2026-03-31T17:13:20Z

Fix alias output naming and improve Spike I/O diagnostics

Problem

The tracer's output naming step (Step 3 in NKIPyKernel._build_code) used a truthiness check (if not r.backend_tensor.name) to decide whether to assign the canonical
output{idx} name. This failed when a non-aliased output tensor already carried a name from tracing (e.g. "intermediate0" from an np.add result). The tensor kept
its intermediate name, causing a mismatch between the NEFF I/O table and what callers pass to kernel(inputs={...}, outputs={...}).

This was discovered during sglang-nkipy integration with the stable neuronx-cc 2.23.6484.0 compiler, where three kernel call patterns broke:

Standalone aliased kernels (e.g. update_kv_cache): NEFF input renamed to kv_cache.must_alias_input but callers still passed kv_cache
Fused graphs with aliased params (e.g. prefill_pre_moe): aliased output named kv_cache shifted other outputs to output1, output2, ...
Broken-identity aliases (e.g. prefill_post_moe): mutated param passed through an NKI wrapper that breaks tensor identity, causing the tracer to auto-append a 3rd
output

Debugging these required extracting NEFF I/O names from HLO protobuf binaries because Spike's _validate_io only produced a bare KeyError.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Non-aliased outputs that already had a name from tracing kept that name instead of the canonical "output{idx}", breaking NEFF I/O name matching. Also adds alias input auto-resolution and better _validate_io errors in Spike, plus tests for all alias naming patterns.

vgene · 2026-04-03T09:53:06Z

spike/src/spike/spike_model.py

                f"got {actual_dtype}"
            )

+    _ALIAS_SUFFIX = ".must_alias_input"


I don't want to complicate the handling of alias naming in spike.
Spike is a wrapper on runtime, it should not need to know how we lower the function into NEFFs. It should only deal with what's available in the NEFFs.

This specific problem can be addressed at the user level? The caller can pass .must_alias_input in the input tensor list.

A proper solution can be in the NEFF lowering in NKIPy (but we want to make sure we are aligned with NKI)

vgene · 2026-04-03T09:54:00Z

spike/src/spike/spike_model.py

        model_core_id = self.model_ref.core_id
+
+        unknown_inputs = set(inputs) - set(self.input_tensors_info)
+        if unknown_inputs:


I like the checks!

hgt312 requested a review from a team March 31, 2026 17:13

vgene requested changes Apr 3, 2026

View reviewed changes

vgene reviewed Apr 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix alias output naming and improve Spike I/O diagnostics#51

Fix alias output naming and improve Spike I/O diagnostics#51
hgt312 wants to merge 1 commit intoaws-neuron:mainfrom
hgt312:alias-output-naming-fix

hgt312 commented Mar 31, 2026

Uh oh!

vgene Apr 3, 2026

Uh oh!

vgene Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hgt312 commented Mar 31, 2026

Uh oh!

vgene Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

vgene Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants