Add ppc64le backend (supports p8 and above architectures) by dannytsen · Pull Request #1648 · pq-code-package/mlkem-native

dannytsen · 2026-04-06T22:16:15Z

Updated patch for ML_KEM for ppc64le supports p8 and above architectures.

Run scripts/autogen and scripts/lint on Mac but not sure if it runs for ppc64le.
Run simpasm on Red Hat Linux.
Added detailed comments on NTT and INTT implementations.
Used C type symbols to improve readability.
Fixed some typos.

Signed-off-by: Danny Tsen dtsen@us.ibm.com

The following tests were run on p10.

[09:28] danny@ltcden12-lp1 new_ppc64le_mlkem % ./scripts/tests func
INFO > Functional Test Compile (native no_opt): make func OPT=0 AUTO=1 -j40
INFO > Functional Test ML-KEM-512 (native no_opt): make run_func_512 -j40
INFO > Functional Test ML-KEM-768 (native no_opt): make run_func_768 -j40
INFO > Functional Test ML-KEM-1024 (native no_opt): make run_func_1024 -j40
INFO > Functional Test Compile (native opt): make func OPT=1 AUTO=1 -j40
INFO > Functional Test ML-KEM-512 (native opt): make run_func_512 -j40
INFO > Functional Test ML-KEM-768 (native opt): make run_func_768 -j40
INFO > Functional Test ML-KEM-1024 (native opt): make run_func_1024 -j40
All good!

[09:28] danny@ltcden12-lp1 new_ppc64le_mlkem % ./scripts/tests bench -c PERF
INFO > Benchmark Compile (native no_opt): make bench OPT=0 AUTO=1 CYCLES=PERF -j40
INFO > Benchmark ML-KEM-512 (native no_opt): make run_bench_512
INFO > Benchmark ML-KEM-512 (native no_opt): test/build/mlkem512/bin/bench_mlkem512
keypair cycles = 66982
encaps cycles = 78820
decaps cycles = 100923

       percentile      1     10     20     30     40     50     60     70     80     90     99

keypair percentiles: 66438 66690 66791 66857 66920 66982 67043 67122 67218 67306 71905
encaps percentiles: 78322 78516 78618 78687 78752 78820 78878 78933 79012 79116 83825
decaps percentiles: 100427 100634 100733 100804 100869 100923 100985 101056 101131 101253 105852

INFO > Benchmark ML-KEM-768 (native no_opt): make run_bench_768
INFO > Benchmark ML-KEM-768 (native no_opt): test/build/mlkem768/bin/bench_mlkem768
keypair cycles = 111380
encaps cycles = 125891
decaps cycles = 154364

       percentile      1     10     20     30     40     50     60     70     80     90     99

keypair percentiles: 110575 110914 111083 111192 111291 111380 111496 111617 111821 112414 116725
encaps percentiles: 125081 125403 125526 125655 125776 125891 125998 126122 126293 126771 131358
decaps percentiles: 153575 153870 154008 154131 154261 154364 154487 154630 154782 155313 159863

INFO > Benchmark ML-KEM-1024 (native no_opt): make run_bench_1024
INFO > Benchmark ML-KEM-1024 (native no_opt): test/build/mlkem1024/bin/bench_mlkem1024
keypair cycles = 166809
encaps cycles = 185315
decaps cycles = 220229

       percentile      1     10     20     30     40     50     60     70     80     90     99

keypair percentiles: 165339 165995 166236 166435 166616 166809 167007 167200 167505 171058 175606
encaps percentiles: 183839 184563 184778 184951 185158 185315 185518 185744 186123 189637 192014
decaps percentiles: 218911 219430 219705 219841 220027 220229 220436 220673 221029 224484 226901

INFO > Benchmark Compile (native opt): make bench OPT=1 AUTO=1 CYCLES=PERF -j40
INFO > Benchmark ML-KEM-512 (native opt): make run_bench_512
INFO > Benchmark ML-KEM-512 (native opt): test/build/mlkem512/bin/bench_mlkem512
keypair cycles = 45750
encaps cycles = 50661
decaps cycles = 63561

       percentile      1     10     20     30     40     50     60     70     80     90     99

keypair percentiles: 45248 45469 45546 45620 45690 45750 45806 45886 45954 46063 50703
encaps percentiles: 50192 50367 50468 50542 50600 50661 50710 50771 50858 50954 55652
decaps percentiles: 63091 63276 63381 63436 63497 63561 63623 63679 63743 63857 68437

INFO > Benchmark ML-KEM-768 (native opt): make run_bench_768
INFO > Benchmark ML-KEM-768 (native opt): test/build/mlkem768/bin/bench_mlkem768
keypair cycles = 79045
encaps cycles = 86455
decaps cycles = 103878

       percentile      1     10     20     30     40     50     60     70     80     90     99

keypair percentiles: 78313 78578 78742 78847 78954 79045 79169 79285 79470 79978 84430
encaps percentiles: 85628 86009 86172 86272 86363 86455 86592 86711 86879 87292 92038
decaps percentiles: 103041 103399 103540 103676 103788 103878 103993 104104 104274 104736 109361

INFO > Benchmark ML-KEM-1024 (native opt): make run_bench_1024
INFO > Benchmark ML-KEM-1024 (native opt): test/build/mlkem1024/bin/bench_mlkem1024
keypair cycles = 124072
encaps cycles = 134500
decaps cycles = 157090

       percentile      1     10     20     30     40     50     60     70     80     90     99

keypair percentiles: 122727 123259 123515 123720 123929 124072 124253 124527 125009 128466 133334
encaps percentiles: 133064 133681 133933 134129 134320 134500 134711 134933 135346 138753 141067
decaps percentiles: 155503 156261 156510 156694 156894 157090 157285 157605 158014 161592 166723

All good!

dannytsen · 2026-04-06T22:19:30Z

@bhess @mkannwischer Please review. Thanks.

mkannwischer · 2026-04-07T00:14:37Z

@bhess @mkannwischer Please review. Thanks.

Thanks @dannytsen for rebasing this PR. Could you get it through CI please?

bhess

Thanks for the updates, @dannytsen.

Minor comments are inline.

supports p8 and above architectures.

I do not think this is correct as written. I see illegal instructions with qemu-ppc64le -cpu power8, while the same code runs with -cpu power9. In particular, lxvx and stxvx do not appear to be supported on p8.

Can you please double-check this and update the requirement to POWER9 if that is indeed the minimum supported target?

dannytsen · 2026-04-07T12:12:30Z

Thanks for the updates, @dannytsen.

Minor comments are inline.

supports p8 and above architectures.

I do not think this is correct as written. I see illegal instructions with qemu-ppc64le -cpu power8, while the same code runs with -cpu power9. In particular, lxvx and stxvx do not appear to be supported on p8.

Can you please double-check this and update the requirement to POWER9 if that is indeed the minimum supported target?

@mkannwischer I tested on p8 system. I have to double check that. Ah, I found the problem, my p8 supports ISA 2.07 which means the qemu supports older one. I'll fix that along with @bhess comments. Thanks.

dannytsen · 2026-04-07T14:34:35Z

@mkannwischer @bhess Just submitted my changes and stated that the supporting architecture to be p9 and above or Power systems support ISA 2.07 and other fixes. Hope this solve all the issues. Thanks.

dannytsen · 2026-04-07T15:49:45Z

@mkannwischer @bhess Submitted an updated code after re-run autogen. Please review.

dannytsen · 2026-04-08T11:50:23Z

@mkannwischer @bhess Not sure what's the errors for 4 CI errors. This was happened after I ran the autogen.

mkannwischer · 2026-04-08T11:56:41Z

@mkannwischer @bhess Not sure what's the errors for 4 CI errors. This was happened after I ran the autogen.

Thanks for the progress.

You can ignore the two OpenTitan tests (those should not be run on a fork because the AWS auth fails) and the markdown lint (that is a spurious failure).
The ppc one needs investigation - you should be able to run the examples locally to reproduce it.

dannytsen · 2026-04-08T12:01:21Z

@mkannwischer @bhess Not sure what's the errors for 4 CI errors. This was happened after I ran the autogen.

Thanks for the progress.

You can ignore the two OpenTitan tests (those should not be run on a fork because the AWS auth fails) and the markdown lint (that is a spurious failure). The ppc one needs investigation - you should be able to run the examples locally to reproduce it.

OK. I'll check into the ppc one. Thanks.

bhess · 2026-04-08T14:53:41Z

@mkannwischer @bhess Just submitted my changes and stated that the supporting architecture to be p9 and above or Power systems support ISA 2.07 and other fixes. Hope this solve all the issues. Thanks.

Thanks for the updates. I have a few more minor follow-ups inline.

dannytsen · 2026-04-08T19:07:37Z

@mkannwischer @bhess Just submitted my changes and stated that the supporting architecture to be p9 and above or Power systems support ISA 2.07 and other fixes. Hope this solve all the issues. Thanks.

Thanks for the updates. I have a few more minor follow-ups inline.

@bhess I did not see your follow-ups or I missed it. Can you post it again? Thanks.

bhess · 2026-04-08T14:50:55Z

+#define V_26    2
+#define V_MKQ   3
+
+.machine "any"


same comment here

I believe this directive will get removed after simpasm.

does it still make sense to include it in the dev-folder then, e.g., for documentation purposes?

@bhess Well, this code still supports p8 but with ISA 2.07. Some older p8s don't support ISA 2.07. Looks like qemu supports older p8 not with ISA 2.07. In general, we support p9 but also p8 with ISA 2.07.

hm, is there any way to make this distinction with the .machine pseudo-op?

That's possible. I'll think about it.

bhess · 2026-04-09T06:04:31Z

@mkannwischer @bhess Just submitted my changes and stated that the supporting architecture to be p9 and above or Power systems support ISA 2.07 and other fixes. Hope this solve all the issues. Thanks.

Thanks for the updates. I have a few more minor follow-ups inline.

@bhess I did not see your follow-ups or I missed it. Can you post it again? Thanks.

Should be visible now.

mkannwischer · 2026-04-09T06:06:06Z

@mkannwischer @bhess Not sure what's the errors for 4 CI errors. This was happened after I ran the autogen.

Thanks for the progress.
You can ignore the two OpenTitan tests (those should not be run on a fork because the AWS auth fails) and the markdown lint (that is a spurious failure). The ppc one needs investigation - you should be able to run the examples locally to reproduce it.

OK. I'll check into the ppc one. Thanks.

If you rebase on top of main, the OpenTitan tests should be properly skipped (fixed in #1649)

dannytsen · 2026-04-09T11:30:42Z

@mkannwischer @bhess I manually modified mlkem/mlkem_native_asm.S and mlkem/mlkem_native.c to add ppc64le include so to pass the monolithic_build_multilevel_native CI tests because autogen can't handle the ppc64le platform. So I am not sure how to fix the test unless autogen get fixed.

bhess · 2026-04-09T12:30:49Z

@mkannwischer @bhess I manually modified mlkem/mlkem_native_asm.S and mlkem/mlkem_native.c to add ppc64le include so to pass the monolithic_build_multilevel_native CI tests because autogen can't handle the ppc64le platform. So I am not sure how to fix the test unless autogen get fixed.

I’ve opened a PR against your branch that updates autogen.
Running the updated autogen should produce identical results to your current manually modified code. PR: dannytsen#1

dannytsen · 2026-04-09T12:49:02Z

@mkannwischer @bhess I manually modified mlkem/mlkem_native_asm.S and mlkem/mlkem_native.c to add ppc64le include so to pass the monolithic_build_multilevel_native CI tests because autogen can't handle the ppc64le platform. So I am not sure how to fix the test unless autogen get fixed.

I’ve opened a PR against your branch that updates autogen. Running the updated autogen should produce identical results to your current manually modified code. PR: dannytsen#1

@bhess Thanks.

dannytsen · 2026-04-10T12:06:15Z

@mkannwischer @bhess I manually modified mlkem/mlkem_native_asm.S and mlkem/mlkem_native.c to add ppc64le include so to pass the monolithic_build_multilevel_native CI tests because autogen can't handle the ppc64le platform. So I am not sure how to fix the test unless autogen get fixed.

I’ve opened a PR against your branch that updates autogen. Running the updated autogen should produce identical results to your current manually modified code. PR: dannytsen#1

@bhess Thanks.

@bhess I merged the autogen and rerun the autogen. Hope this commit fix the problem. Thanks.

dannytsen · 2026-04-10T17:28:01Z

@bhess I just added a capability check of PPC_FEATURE2_ARCH_2_07. It will fall back to the original implementation if the architecture doesn't support ISA 2.07 and above which include some p8 and p9 above. Please review. Thanks.

dannytsen · 2026-04-13T11:35:55Z

@mkannwischer @bhess Any feedback on the recent commit? Thanks.

dannytsen · 2026-04-13T11:59:42Z

@mkannwischer In qemu, power8e supports ISA 2.07 which is a variant of p8 with DD2.

bhess

Thanks for the update, @dannytsen.
The runtime capability check goes in a slightly different direction than what I had in mind. My suggestion was about whether the .machine pseudo-op can express something like "POWER8 with ISA 2.07 support". If there's no way to express this with .machine, I'd suggest just removing it from the assembly and documenting the ISA 2.07 requirement in the project docs / README instead (instead of machine "any").

For the dynamic getauxval approach: I wonder if runtime feature detection might be better left to downstream consumers of mlkem-native (e.g., liboqs) rather than adding it here. There's also the consideration that getauxval/<sys/auxv.h> is not portable AFAIK. Any thoughts @mkannwischer ?

dannytsen · 2026-04-13T12:21:42Z

Thanks for the update, @dannytsen. The runtime capability check goes in a slightly different direction than what I had in mind. My suggestion was about whether the .machine pseudo-op can express something like "POWER8 with ISA 2.07 support". If there's no way to express this with .machine, I'd suggest just removing it from the assembly and documenting the ISA 2.07 requirement in the project docs / README instead (instead of machine "any").

For the dynamic getauxval approach: I wonder if runtime feature detection might be better left to upstream consumers of mlkem-native (e.g., liboqs) rather than adding it here. There's also the consideration that getauxval/<sys/auxv.h> is not portable AFAIK. Any thoughts @mkannwischer ?

@bhess ".machine any" is already a psudo-op. I can remove that in dev/ppc64le/src. The README is already had the update.

For getauxval approach, it's to prevent the code run on a older Power systems that did not support proper instaructions.

bhess

Coming back to this after a closer look at runtime CPU checks in mlkem-native:
my remaining concern is the runtime capability check in the PR. mlkem-native already has a designed-in mechanism for exactly this case, and I think the PR should use it rather than doing its own in the native backend.

Hook: MLK_CONFIG_CUSTOM_CAPABILITY_FUNC + the
mlk_sys_check_capability(mlk_sys_cap) callback declared in
mlkem/src/sys.h.
Backends call it; consumers override it for runtime dispatch.
Examples in other backends:
- test/configs/custom_native_capability_config_CPUID_AVX2.h
- test/configs/custom_native_capability_config_ID_AA64PFR1_EL1.h

Concretely, this would mean adding MLK_SYS_CAP_PPC_ISA_2_07 to the enum,
replacing the four mlkem_ppc_check_cap() calls with
mlk_sys_check_capability(...), and shipping the auxv code as a new
test/configs/custom_native_capability_config_PPC_AUXV.h. Specifics are inline.

@mkannwischer: please correct me if that's not the intended pattern, and feel free to share any additional review.

bhess · 2026-04-29T14:58:04Z

+#include "../api.h"
+#include "src/arith_native_ppc64le.h"
+
+#include <sys/auxv.h>


Because this header is pulled in by everyone that uses the native backend (via src/native/meta.h), this include makes the whole native build glibc-dependent.

The existing AVX2 and NEON backends don't query the OS, they assume the ISA the build was configured for, and runtime dispatch is delegated to the consumer through MLK_CONFIG_CUSTOM_CAPABILITY_FUNC (see test/configs/custom_native_capability_config_CPUID_AVX2.h and ..._ID_AA64PFR1_EL1.h).
I'd advise to use the same pattern here: drop this include, and move the auxv-based detection into a new
test/configs/custom_native_capability_config_PPC_AUXV.h example.

I believe this is auto generated.

Ah yes, anyway I think this is moot and not a blocker if the checks are removed for now (see the other thread). I can handle the autogen in a followup.

bhess

Thanks for the updates @dannytsen. I re-reviewed this together with the earlier discussion in #1184. The removal of the runtime capability check resolves my previous comment, and the main readability points raised on the earlier PR also seem addressed for this iteration. I’m not the maintainer, but from my side this iteration now looks good and I’d support merging it.

- ruff format scripts/autogen (formatting fix) - Add check-magic annotation for array size 2072 in consts.c and consts.h (7 groups of 8 base constants + 4 twiddle tables * 63 rows * 8 values) Signed-off-by: Basil Hess <bhe@zurich.ibm.com>

bhess · 2026-05-21T07:37:56Z

Thanks @mkannwischer. I pushed an update that fixes the linting (linting now passes locally).

mkannwischer · 2026-05-21T08:15:16Z

Thanks @mkannwischer. I pushed an update that fixes the linting (linting now passes locally).

Thanks! There are some issues in the POWER-7 tests: https://github.com/pq-code-package/mlkem-native/actions/runs/26212206823/job/77130946223?pr=1648. Could you take a look please?

bhess · 2026-05-21T08:16:22Z

Thanks! There are some issues in the POWER-7 tests: https://github.com/pq-code-package/mlkem-native/actions/runs/26212206823/job/77130946223?pr=1648. Could you take a look please?

Yes just noticed it, will take a look.

mkannwischer · 2026-05-21T08:28:56Z

For the HOL-Light failures, a workaround would be to add toolchain_ppc64le to the hol_light-cross, hol_light-cross-aarch64, hol_light-cross-x86_64 shells. That's what we did for gcc-arm-embedded, too. That is fine for now until @hanno-becker or I can take a look on how to make --force-cross a little more selective.

…power8 (nix libc is compiled for power8 and otherwise causes illegal instructions. Avoids unused data parameter errors in the fallback code path. Signed-off-by: Basil Hess <bhe@zurich.ibm.com>

Signed-off-by: Basil Hess <bhe@zurich.ibm.com>

bhess · 2026-05-21T12:09:37Z

Thanks! There are some issues in the POWER-7 tests: https://github.com/pq-code-package/mlkem-native/actions/runs/26212206823/job/77130946223?pr=1648. Could you take a look please?

The Illegal instruction issue doen't appear to come from the mlkem code itself, but from the nix toolchain that uses a glibc compiled for power8, so binaries fault when run under qemu-ppc64le -cpu power7.
As a workaround, I changed the CI to still compile with -mcpu=power7 to enable the fallback, but to run the resulting binary with qemu-ppc64le -cpu power8 so that the binary runs properly. I think a cleaner long-term fix would be a separate ppc64le toolchain built for power7.

I also reproduced the fallback path locally. While doing that I hit an issue in the fallback-path (unused variable) fixed in commit 3442786.

For the HOL-Light failures, a workaround would be to add toolchain_ppc64le to the hol_light-cross, hol_light-cross-aarch64, hol_light-cross-x86_64 shells. That's what we did for gcc-arm-embedded, too. That is fine for now until @hanno-becker or I can take a look on how to make --force-cross a little more selective.

Done in the latest commit.

mkannwischer · 2026-05-21T14:46:27Z

Thanks! There are some issues in the POWER-7 tests: https://github.com/pq-code-package/mlkem-native/actions/runs/26212206823/job/77130946223?pr=1648. Could you take a look please?

The Illegal instruction issue doen't appear to come from the mlkem code itself, but from the nix toolchain that uses a glibc compiled for power8, so binaries fault when run under qemu-ppc64le -cpu power7. As a workaround, I changed the CI to still compile with -mcpu=power7 to enable the fallback, but to run the resulting binary with qemu-ppc64le -cpu power8 so that the binary runs properly. I think a cleaner long-term fix would be a separate ppc64le toolchain built for power7.

I also reproduced the fallback path locally. While doing that I hit an issue in the fallback-path (unused variable) fixed in commit 3442786.

Thanks for investigating. That's annoying. I think this solution is acceptable for now and I'm glad that it at least revealed some problems we would have otherwise missed.

I have updated #1677 so we can run the full CI and benchmarks there.

hanno-becker · 2026-05-30T10:20:03Z

+#if defined(MLK_ARITH_BACKEND_PPC64LE_DEFAULT) && \
+    !defined(MLK_CONFIG_MULTILEVEL_NO_SHARED) && defined(__POWER8_VECTOR__)
+
+#include "consts.h"


Why is the entire file not autogenerated?

Updated in cc9087d which autogenerates the full consts.c file (without needing .inc files)

hanno-becker · 2026-05-30T10:20:34Z

+
+/* Offsets into the constant table */
+/* check-magic: off */
+#define MLK_PPC_NQ_OFFSET 0


If we auto-generate consts.c, we can also auto-generate those offsets from the same code.

Updated in cc9087d, consts.h is now auto-generated.

hanno-becker · 2026-05-30T10:33:56Z

+
+/*
+ * Load coefficient in r[j+len] (r') vectors from offset, R10, R17, R19 and R21
+ *  r[j+len]: V13, V18, V23, V28


It is confusing to talk about absolute register values here if the code itself uses symbolic ones.

Fixed in cc9087d.Load_4Rjp is now load_high; the comment uses the symbolic names.

hanno-becker · 2026-05-30T10:56:03Z

+        /* Since the result of the Barrett multiplication is bounded
+           by q/2 in absolute value.


That's not true, and the sentence just seems to stop.

Removed the comment in cc9087d as it was also out of context. I think the same bound applies to Barrett multiplication as in eprint 2021/986 (Algorithm 6 / Corollary 2: |bar_N(a,b)| < N),

hanno-becker · 2026-05-30T10:57:22Z

+ *  r[j]: V12, V17, V22, V27
+ */
+.macro Load_4Rj
+        lxvd2x  32+vdata_a1, rinp, a1_offset     /* V12: vector r0 */
+        lxvd2x  32+vdata_a2, rinp, a2_offset    /* V17: vector r1 */
+        lxvd2x  32+vdata_a3, rinp, a3_offset    /* V22: vector r2 */
+        lxvd2x  32+vdata_a4, rinp, a4_offset    /* V27: vector r3 */


As above: Why talk about absolute registers when using symbolic names?

Fixed in cc9087d. Load_4Rj is now load_low; the comment uses the symbolic names

hanno-becker · 2026-05-30T10:58:18Z

+ *   Compute forward NTT based on the following 7 layers -
+ *     len = 128, 64, 32, 16, 8, 4, 2.
+ *
+ *   Each layer compute the coefficients on 2 legs, start and start + len*2 offsets.


It's unclear what is meant here

Rewritten. The comment now says: each layer pairs a low coefficient r[j] (leg 1) with
a high coefficient r[j+len] (leg 2).

hanno-becker · 2026-05-30T10:59:54Z

+ *
+ *   leg 1                        leg 2
+ *   -----                        -----
+ *   start                        start+len*2


Is len*2 correct here? For len=128 it would be 256, which seems wrong.

Ok, one should clarify whether one is talking about byte-offsets or array-offsets.

Updated, the comment now states Offsets below are byte offsets into r...

hanno-becker · 2026-05-30T11:10:22Z

+ * following order,
+ *  rjlen0, rjlen1, rjlen2, rjlen3, rjlen4, rjlen5, rjlen6, rjlen7
+ */
+.macro Load_4Coeffs start, next


It's a bit confusing that this includes Load_4Rjp but not Load_4Rp. I suggest to remove this macro and just inline it into its unique call-site layer12345

Updated.. Load_4Coeffs is removed. CT_Butterfly_4x (formerly ntt_layer12345) now calls
Init_Coeffs_offset and load_high directly.

hanno-becker · 2026-05-30T11:11:12Z

+ * Load coefficient in r[j+len] (r') vectors from offset, R10, R17, R19 and R21
+ *  r[j+len]: V13, V18, V23, V28
+ */
+.macro Load_4Rjp


Please use a more descriptive name, like load_high and load_low and comment that those are the coefficient subject to / not subject to scaling in the CT butterfly.

Done: Load_4Rjp-> load_high, Load_4Rj -> load_low, each with a comment noting
which operand is scaled by zeta in the CT butterfly and which is not.

hanno-becker · 2026-05-30T11:11:53Z

+        Load_4Coeffs \start, \next
+        barrett_fqmul_4x \_vz0, \_vz1, \_vz2, \_vz3, \_vzt0, \_vzt1, \_vzt2, \_vzt3
+        Load_4Rj
+        Compute_4Coeffs


Please pick a more descriptive name for Compute_4Coeffs, e.g. AddSub_4x or something.

Done: renamed to AddSub_4x.

hanno-becker · 2026-05-30T11:12:42Z

+/*
+ * NTT other layers, 1, 2, 3, 4, 5.
+ */
+.macro ntt_layer12345 start, next, _vz0, _vz1, _vz2, _vz3, _vzt0, _vzt1, _vzt2, _vzt3


Please pick a more descriptive name for this macro. This is not an entire NTT layer, it's a set of butterflies. E.g. CT_Butterfly_4x

Done: renaming ntt_layer12345 to CT_Butterfly_4x

autogenerate full consts.c and consts.h Signed-off-by: Basil Hess <bhe@zurich.ibm.com> Sync iNTT & Improved ABI note Signed-off-by: Basil Hess <bhe@zurich.ibm.com>

bhess · 2026-06-02T18:00:40Z

Thanks for the review, @hanno-becker! I pushed cc9087d with an update, see my replies in the inline comments.

hanno-becker

Thank you @bhess for your work 🙏

poly_tomont still uses rounding Montgomery -- can we change this to Barrett multiplication, too, please?

There's still a large number of places where I feel documentation/naming should be improved, but I'll take care of that.

hanno-becker · 2026-06-04T10:34:10Z

+        vmladduhm 15, 13, V1353, 3
+        vmladduhm 20, 18, V1353, 3
+        vmladduhm 25, 23, V1353, 3
+        vmladduhm 9, 7, V1353, 3
+
+        vmhraddshs 14, 13, V1353, 3
+        vmhraddshs 19, 18, V1353, 3
+        vmhraddshs 24, 23, V1353, 3
+        vmhraddshs 8, 7, V1353, 3
+
+        vmladduhm 15, 15, V_QINV, 3
+        vmladduhm 20, 20, V_QINV, 3
+        vmladduhm 25, 25, V_QINV, 3
+        vmladduhm 9, 9, V_QINV, 3
+
+        vmhraddshs 15, 15, V_NMKQ, 14


@bhess This is still rounding Montgomery. Can we adjust this to use Barrett multiplication as well, please?

bhess · 2026-06-04T10:55:05Z

poly_tomont still uses rounding Montgomery -- can we change this to Barrett multiplication, too, please?

Sure, I'll take care of that.

There's still a large number of places where I feel documentation/naming should be improved, but I'll take care of that.

Thank you @hanno-becker!

Signed-off-by: Basil Hess <bhe@zurich.ibm.com>

bhess · 2026-06-05T09:56:05Z

poly_tomont still uses rounding Montgomery -- can we change this to Barrett multiplication, too, please?

I pushed an update in f511d3b changing poly_tomont to Barrett multiplication.

dannytsen requested a review from a team as a code owner April 6, 2026 22:16

dannytsen mentioned this pull request Apr 6, 2026

Add PPC64LE backend #1193

Draft

5 tasks

mkannwischer self-assigned this Apr 7, 2026

mkannwischer added the ppc64le label Apr 7, 2026

bhess reviewed Apr 7, 2026

View reviewed changes

bhess reviewed Apr 9, 2026

View reviewed changes

dannytsen force-pushed the new_main branch from 277d07f to faf69e0 Compare April 10, 2026 12:04

bhess reviewed Apr 13, 2026

View reviewed changes

Comment thread mlkem/src/native/ppc64le/meta.h Outdated

bhess reviewed Apr 29, 2026

View reviewed changes

bhess approved these changes May 4, 2026

View reviewed changes

ppc64le: fix Python and magic-constant linting

6b242ce

- ruff format scripts/autogen (formatting fix) - Add check-magic annotation for array size 2072 in consts.c and consts.h (7 groups of 8 base constants + 4 twiddle tables * 63 rows * 8 values) Signed-off-by: Basil Hess <bhe@zurich.ibm.com>

bhess force-pushed the new_main branch from f06bbb9 to 6b242ce Compare May 21, 2026 07:33

bhess added 2 commits May 21, 2026 13:27

ppc64le: fix power7 fallback by compiling for power7 but running for …

3442786

…power8 (nix libc is compiled for power8 and otherwise causes illegal instructions. Avoids unused data parameter errors in the fallback code path. Signed-off-by: Basil Hess <bhe@zurich.ibm.com>

ppc64le: workaround to avoid hol-light failures

12036a9

Signed-off-by: Basil Hess <bhe@zurich.ibm.com>

hanno-becker reviewed May 30, 2026

View reviewed changes

hanno-becker reviewed May 31, 2026

View reviewed changes

Comment thread dev/ppc64le/src/ntt_ppc_asm.S

hanno-becker reviewed May 31, 2026

View reviewed changes

Comment thread dev/ppc64le/src/ntt_ppc_asm.S

address hannobecker's comments

cc9087d

autogenerate full consts.c and consts.h Signed-off-by: Basil Hess <bhe@zurich.ibm.com> Sync iNTT & Improved ABI note Signed-off-by: Basil Hess <bhe@zurich.ibm.com>

Merge branch 'main' into new_main

6c6b686

hanno-becker requested changes Jun 4, 2026

View reviewed changes

poly_tomont with barrett

f511d3b

Signed-off-by: Basil Hess <bhe@zurich.ibm.com>

		/* Since the result of the Barrett multiplication is bounded
		by q/2 in absolute value.

Conversation

dannytsen commented Apr 6, 2026

Uh oh!

dannytsen commented Apr 6, 2026

Uh oh!

mkannwischer commented Apr 7, 2026

Uh oh!

bhess left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dannytsen commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dannytsen commented Apr 7, 2026

Uh oh!

dannytsen commented Apr 7, 2026

Uh oh!

dannytsen commented Apr 8, 2026

Uh oh!

mkannwischer commented Apr 8, 2026

Uh oh!

dannytsen commented Apr 8, 2026

Uh oh!

bhess commented Apr 8, 2026

Uh oh!

dannytsen commented Apr 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dannytsen Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bhess commented Apr 9, 2026

Uh oh!

mkannwischer commented Apr 9, 2026

Uh oh!

dannytsen commented Apr 9, 2026

Uh oh!

bhess commented Apr 9, 2026

Uh oh!

dannytsen commented Apr 9, 2026

Uh oh!

dannytsen commented Apr 10, 2026

Uh oh!

dannytsen commented Apr 10, 2026

Uh oh!

dannytsen commented Apr 13, 2026

Uh oh!

dannytsen commented Apr 13, 2026

Uh oh!

bhess left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dannytsen commented Apr 13, 2026

Uh oh!

bhess left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

dannytsen commented Apr 7, 2026 •

edited

Loading

dannytsen Apr 9, 2026 •

edited

Loading

bhess left a comment •

edited

Loading

bhess left a comment •

edited

Loading

bhess commented May 21, 2026 •

edited

Loading

bhess Jun 2, 2026 •

edited

Loading

hanno-becker May 30, 2026 •

edited

Loading