Skip to content

update for 2.29 | NKI beta 3 #123

Open
jlonge4 wants to merge 1 commit intoaws-neuron:mainfrom
jlonge4:beta3-update
Open

update for 2.29 | NKI beta 3 #123
jlonge4 wants to merge 1 commit intoaws-neuron:mainfrom
jlonge4:beta3-update

Conversation

@jlonge4
Copy link
Copy Markdown
Contributor

@jlonge4 jlonge4 commented Apr 14, 2026

Issue #, if available:

N/A

Description of changes:

  1. nisa.memset strict type matching — Changed value=0.0 to value=0 in nki_matmul_fully_optimized_. NKI 0.3.0 enforces that the value literal must match the destination tensor's dtype; 0.0 (Python float64) is a type mismatch against lhsT.dtype.
  2. nisa.tensor_copy — removed deprecated dtype parameter — The dtype keyword argument has been removed from nisa.tensor_copy in NKI 0.3.0. Removed from three call sites across nki_matmul_basic_, nki_matmul_tiled_, and nki_matmul_hoist_load_. The cast is now expressed entirely through the destination tensor's declared dtype.
  3. nki_matmul_hoist_load_ — corrected res_sb allocation dtype — The res_sb SBUF buffer was incorrectly declared as nl.float32 while the intent was to store the final output-typed result. Changed to result.dtype to align with the other kernels and correctly express the PSUM→output dtype cast via the destination allocation.

Testing:

Please see detailed unit test requirements in the CONTRIBUTING.md

  • The change is covered by numeric check using nki.baremetal
  • The change is covered by performance benchmark test using nki.benchmark
  • The change is covered by end-to-end integration test

Pull Request Checklist

  • I have filled in all the required field in the template
  • I have tested locally that all the tests pass
  • By submitting this pull request, I confirm that my contribution is made under the terms of the MIT-0 license.

@jlonge4 jlonge4 changed the title update for beta 3 update for 2.29 | NKI beta 3 Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant