Add ppc64le backend (supports p8 and above architectures) [full CI]#1677
Add ppc64le backend (supports p8 and above architectures) [full CI]#1677mkannwischer wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks
Details
| Benchmark suite | Current: 12036a9 | Previous: db75353 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
12319 cycles |
12319 cycles |
1 |
ML-KEM-512 encaps |
14997 cycles |
14997 cycles |
1 |
ML-KEM-512 decaps |
19549 cycles |
19551 cycles |
1.00 |
ML-KEM-768 keypair |
21263 cycles |
21264 cycles |
1.00 |
ML-KEM-768 encaps |
23873 cycles |
23874 cycles |
1.00 |
ML-KEM-768 decaps |
30417 cycles |
30425 cycles |
1.00 |
ML-KEM-1024 keypair |
30328 cycles |
30327 cycles |
1.00 |
ML-KEM-1024 encaps |
34573 cycles |
34573 cycles |
1 |
ML-KEM-1024 decaps |
44189 cycles |
44190 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
ppc64le (POWER10) benchmarks
Details
| Benchmark suite | Current: 12036a9 | Previous: db75353 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
38059 cycles |
59591 cycles |
0.64 |
ML-KEM-512 encaps |
43668 cycles |
72335 cycles |
0.60 |
ML-KEM-512 decaps |
53803 cycles |
92227 cycles |
0.58 |
ML-KEM-768 keypair |
67131 cycles |
97410 cycles |
0.69 |
ML-KEM-768 encaps |
76341 cycles |
114292 cycles |
0.67 |
ML-KEM-768 decaps |
90588 cycles |
139751 cycles |
0.65 |
ML-KEM-1024 keypair |
108574 cycles |
151020 cycles |
0.72 |
ML-KEM-1024 encaps |
119069 cycles |
169141 cycles |
0.70 |
ML-KEM-1024 decaps |
137495 cycles |
200776 cycles |
0.68 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: 12036a9 | Previous: db75353 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
12048 cycles |
12038 cycles |
1.00 |
ML-KEM-512 encaps |
13632 cycles |
13787 cycles |
0.99 |
ML-KEM-512 decaps |
17778 cycles |
17801 cycles |
1.00 |
ML-KEM-768 keypair |
21266 cycles |
21014 cycles |
1.01 |
ML-KEM-768 encaps |
22146 cycles |
22184 cycles |
1.00 |
ML-KEM-768 decaps |
28443 cycles |
28329 cycles |
1.00 |
ML-KEM-1024 keypair |
29577 cycles |
29959 cycles |
0.99 |
ML-KEM-1024 encaps |
31745 cycles |
31722 cycles |
1.00 |
ML-KEM-1024 decaps |
39476 cycles |
39346 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Intel Xeon 4th gen (c7i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 5f3d329 | Previous: 2ee902c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
12044 cycles |
9751 cycles |
1.24 |
ML-KEM-512 encaps |
13624 cycles |
11423 cycles |
1.19 |
ML-KEM-512 decaps |
17783 cycles |
15570 cycles |
1.14 |
ML-KEM-768 keypair |
21292 cycles |
16302 cycles |
1.31 |
ML-KEM-768 encaps |
22015 cycles |
17954 cycles |
1.23 |
ML-KEM-768 decaps |
28023 cycles |
23461 cycles |
1.19 |
ML-KEM-1024 keypair |
29562 cycles |
22439 cycles |
1.32 |
ML-KEM-1024 encaps |
31715 cycles |
24509 cycles |
1.29 |
ML-KEM-1024 decaps |
39395 cycles |
32178 cycles |
1.22 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: 12036a9 | Previous: db75353 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
14207 cycles |
14376 cycles |
0.99 |
ML-KEM-512 encaps |
15984 cycles |
16060 cycles |
1.00 |
ML-KEM-512 decaps |
21538 cycles |
21627 cycles |
1.00 |
ML-KEM-768 keypair |
25114 cycles |
24794 cycles |
1.01 |
ML-KEM-768 encaps |
25658 cycles |
25550 cycles |
1.00 |
ML-KEM-768 decaps |
33523 cycles |
33409 cycles |
1.00 |
ML-KEM-1024 keypair |
34848 cycles |
37228 cycles |
0.94 |
ML-KEM-1024 encaps |
36114 cycles |
37346 cycles |
0.97 |
ML-KEM-1024 decaps |
47236 cycles |
46787 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: 12036a9 | Previous: db75353 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
12796 cycles |
12777 cycles |
1.00 |
ML-KEM-512 encaps |
14285 cycles |
14269 cycles |
1.00 |
ML-KEM-512 decaps |
19148 cycles |
19117 cycles |
1.00 |
ML-KEM-768 keypair |
22525 cycles |
22412 cycles |
1.01 |
ML-KEM-768 encaps |
23071 cycles |
23051 cycles |
1.00 |
ML-KEM-768 decaps |
30094 cycles |
30064 cycles |
1.00 |
ML-KEM-1024 keypair |
34224 cycles |
32997 cycles |
1.04 |
ML-KEM-1024 encaps |
33002 cycles |
33104 cycles |
1.00 |
ML-KEM-1024 decaps |
42405 cycles |
42483 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: 12036a9 | Previous: db75353 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
28337 cycles |
28237 cycles |
1.00 |
ML-KEM-512 encaps |
36782 cycles |
36623 cycles |
1.00 |
ML-KEM-512 decaps |
45398 cycles |
45123 cycles |
1.01 |
ML-KEM-768 keypair |
46215 cycles |
46315 cycles |
1.00 |
ML-KEM-768 encaps |
55787 cycles |
55593 cycles |
1.00 |
ML-KEM-768 decaps |
69875 cycles |
69917 cycles |
1.00 |
ML-KEM-1024 keypair |
70417 cycles |
70363 cycles |
1.00 |
ML-KEM-1024 encaps |
82459 cycles |
82510 cycles |
1.00 |
ML-KEM-1024 decaps |
99343 cycles |
99218 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 12036a9 | Previous: db75353 | Ratio |
|---|---|---|---|
ML-KEM-1024 keypair |
34224 cycles |
32997 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: 12036a9 | Previous: db75353 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
17603 cycles |
17547 cycles |
1.00 |
ML-KEM-512 encaps |
19900 cycles |
19938 cycles |
1.00 |
ML-KEM-512 decaps |
26420 cycles |
26450 cycles |
1.00 |
ML-KEM-768 keypair |
31203 cycles |
31168 cycles |
1.00 |
ML-KEM-768 encaps |
31989 cycles |
32415 cycles |
0.99 |
ML-KEM-768 decaps |
41468 cycles |
41536 cycles |
1.00 |
ML-KEM-1024 keypair |
43770 cycles |
43998 cycles |
0.99 |
ML-KEM-1024 encaps |
45855 cycles |
46270 cycles |
0.99 |
ML-KEM-1024 decaps |
58042 cycles |
58266 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Intel Xeon 3rd gen (c6i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 5f3d329 | Previous: 2ee902c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
17588 cycles |
16301 cycles |
1.08 |
ML-KEM-512 encaps |
19896 cycles |
18736 cycles |
1.06 |
ML-KEM-512 decaps |
26412 cycles |
25234 cycles |
1.05 |
ML-KEM-768 keypair |
31190 cycles |
28649 cycles |
1.09 |
ML-KEM-768 encaps |
31768 cycles |
30001 cycles |
1.06 |
ML-KEM-1024 keypair |
43790 cycles |
37884 cycles |
1.16 |
ML-KEM-1024 encaps |
45790 cycles |
40704 cycles |
1.12 |
ML-KEM-1024 decaps |
58108 cycles |
54265 cycles |
1.07 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton4
Details
| Benchmark suite | Current: 12036a9 | Previous: db75353 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
17686 cycles |
17646 cycles |
1.00 |
ML-KEM-512 encaps |
20642 cycles |
20608 cycles |
1.00 |
ML-KEM-512 decaps |
27077 cycles |
27084 cycles |
1.00 |
ML-KEM-768 keypair |
29981 cycles |
29899 cycles |
1.00 |
ML-KEM-768 encaps |
32757 cycles |
32774 cycles |
1.00 |
ML-KEM-768 decaps |
42007 cycles |
41962 cycles |
1.00 |
ML-KEM-1024 keypair |
43716 cycles |
43745 cycles |
1.00 |
ML-KEM-1024 encaps |
48773 cycles |
48719 cycles |
1.00 |
ML-KEM-1024 decaps |
61379 cycles |
61386 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: 12036a9 | Previous: db75353 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
40197 cycles |
40266 cycles |
1.00 |
ML-KEM-512 encaps |
48368 cycles |
48417 cycles |
1.00 |
ML-KEM-512 decaps |
62501 cycles |
62596 cycles |
1.00 |
ML-KEM-768 keypair |
63804 cycles |
63800 cycles |
1.00 |
ML-KEM-768 encaps |
74937 cycles |
74978 cycles |
1.00 |
ML-KEM-768 decaps |
93413 cycles |
93631 cycles |
1.00 |
ML-KEM-1024 keypair |
95310 cycles |
95102 cycles |
1.00 |
ML-KEM-1024 encaps |
109384 cycles |
109294 cycles |
1.00 |
ML-KEM-1024 decaps |
132137 cycles |
132065 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: 12036a9 | Previous: db75353 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
36602 cycles |
36810 cycles |
0.99 |
ML-KEM-512 encaps |
43115 cycles |
43058 cycles |
1.00 |
ML-KEM-512 decaps |
55703 cycles |
55671 cycles |
1.00 |
ML-KEM-768 keypair |
58678 cycles |
58693 cycles |
1.00 |
ML-KEM-768 encaps |
67624 cycles |
67471 cycles |
1.00 |
ML-KEM-768 decaps |
84521 cycles |
84392 cycles |
1.00 |
ML-KEM-1024 keypair |
89114 cycles |
89088 cycles |
1.00 |
ML-KEM-1024 encaps |
99256 cycles |
99346 cycles |
1.00 |
ML-KEM-1024 decaps |
120774 cycles |
120756 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: 12036a9 | Previous: db75353 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
45749 cycles |
45722 cycles |
1.00 |
ML-KEM-512 encaps |
54475 cycles |
54376 cycles |
1.00 |
ML-KEM-512 decaps |
69855 cycles |
69830 cycles |
1.00 |
ML-KEM-768 keypair |
74173 cycles |
74187 cycles |
1.00 |
ML-KEM-768 encaps |
86050 cycles |
86041 cycles |
1.00 |
ML-KEM-768 decaps |
106672 cycles |
106532 cycles |
1.00 |
ML-KEM-1024 keypair |
112123 cycles |
112130 cycles |
1.00 |
ML-KEM-1024 encaps |
124717 cycles |
124654 cycles |
1.00 |
ML-KEM-1024 decaps |
150632 cycles |
150714 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton3
Details
| Benchmark suite | Current: 12036a9 | Previous: db75353 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
18675 cycles |
18639 cycles |
1.00 |
ML-KEM-512 encaps |
21889 cycles |
21878 cycles |
1.00 |
ML-KEM-512 decaps |
28890 cycles |
28864 cycles |
1.00 |
ML-KEM-768 keypair |
31630 cycles |
31545 cycles |
1.00 |
ML-KEM-768 encaps |
34788 cycles |
34776 cycles |
1.00 |
ML-KEM-768 decaps |
44839 cycles |
44778 cycles |
1.00 |
ML-KEM-1024 keypair |
46069 cycles |
46079 cycles |
1.00 |
ML-KEM-1024 encaps |
51492 cycles |
51492 cycles |
1 |
ML-KEM-1024 decaps |
65005 cycles |
65023 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: 12036a9 | Previous: db75353 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
35504 cycles |
35410 cycles |
1.00 |
ML-KEM-512 encaps |
40175 cycles |
40114 cycles |
1.00 |
ML-KEM-512 decaps |
51132 cycles |
51139 cycles |
1.00 |
ML-KEM-768 keypair |
56800 cycles |
56670 cycles |
1.00 |
ML-KEM-768 encaps |
64827 cycles |
65147 cycles |
1.00 |
ML-KEM-768 decaps |
78931 cycles |
79294 cycles |
1.00 |
ML-KEM-1024 keypair |
87846 cycles |
87857 cycles |
1.00 |
ML-KEM-1024 encaps |
97109 cycles |
96871 cycles |
1.00 |
ML-KEM-1024 decaps |
115956 cycles |
115822 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks
Details
| Benchmark suite | Current: 12036a9 | Previous: db75353 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
28265 cycles |
28220 cycles |
1.00 |
ML-KEM-512 encaps |
34157 cycles |
34107 cycles |
1.00 |
ML-KEM-512 decaps |
44377 cycles |
44335 cycles |
1.00 |
ML-KEM-768 keypair |
47618 cycles |
47614 cycles |
1.00 |
ML-KEM-768 encaps |
53934 cycles |
53937 cycles |
1.00 |
ML-KEM-768 decaps |
68340 cycles |
68365 cycles |
1.00 |
ML-KEM-1024 keypair |
70248 cycles |
70248 cycles |
1 |
ML-KEM-1024 encaps |
78733 cycles |
78728 cycles |
1.00 |
ML-KEM-1024 decaps |
98416 cycles |
98444 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: 12036a9 | Previous: db75353 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
39016 cycles |
38886 cycles |
1.00 |
ML-KEM-512 encaps |
44562 cycles |
44589 cycles |
1.00 |
ML-KEM-512 decaps |
56630 cycles |
56665 cycles |
1.00 |
ML-KEM-768 keypair |
62456 cycles |
62296 cycles |
1.00 |
ML-KEM-768 encaps |
71385 cycles |
72308 cycles |
0.99 |
ML-KEM-768 decaps |
86856 cycles |
87700 cycles |
0.99 |
ML-KEM-1024 keypair |
96224 cycles |
96159 cycles |
1.00 |
ML-KEM-1024 encaps |
106363 cycles |
106136 cycles |
1.00 |
ML-KEM-1024 decaps |
126811 cycles |
126585 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton2
Details
| Benchmark suite | Current: 12036a9 | Previous: db75353 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
28238 cycles |
28274 cycles |
1.00 |
ML-KEM-512 encaps |
34161 cycles |
34125 cycles |
1.00 |
ML-KEM-512 decaps |
44340 cycles |
44382 cycles |
1.00 |
ML-KEM-768 keypair |
47638 cycles |
47672 cycles |
1.00 |
ML-KEM-768 encaps |
53920 cycles |
53906 cycles |
1.00 |
ML-KEM-768 decaps |
68398 cycles |
68361 cycles |
1.00 |
ML-KEM-1024 keypair |
70366 cycles |
70253 cycles |
1.00 |
ML-KEM-1024 encaps |
78752 cycles |
78754 cycles |
1.00 |
ML-KEM-1024 decaps |
98551 cycles |
98440 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
SpacemiT K1 8 (Banana Pi F3) benchmarks
Details
| Benchmark suite | Current: 12036a9 | Previous: db75353 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
155506 cycles |
155497 cycles |
1.00 |
ML-KEM-512 encaps |
163419 cycles |
163389 cycles |
1.00 |
ML-KEM-512 decaps |
206655 cycles |
206624 cycles |
1.00 |
ML-KEM-768 keypair |
249866 cycles |
249882 cycles |
1.00 |
ML-KEM-768 encaps |
270396 cycles |
270411 cycles |
1.00 |
ML-KEM-768 decaps |
332775 cycles |
332827 cycles |
1.00 |
ML-KEM-1024 keypair |
395754 cycles |
395617 cycles |
1.00 |
ML-KEM-1024 encaps |
423609 cycles |
422610 cycles |
1.00 |
ML-KEM-1024 decaps |
507214 cycles |
506225 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: 12036a9 | Previous: db75353 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
59230 cycles |
59139 cycles |
1.00 |
ML-KEM-512 encaps |
68655 cycles |
68634 cycles |
1.00 |
ML-KEM-512 decaps |
87379 cycles |
87351 cycles |
1.00 |
ML-KEM-768 keypair |
95187 cycles |
95327 cycles |
1.00 |
ML-KEM-768 encaps |
109224 cycles |
109878 cycles |
0.99 |
ML-KEM-768 decaps |
134022 cycles |
134352 cycles |
1.00 |
ML-KEM-1024 keypair |
146800 cycles |
148090 cycles |
0.99 |
ML-KEM-1024 encaps |
162753 cycles |
163969 cycles |
0.99 |
ML-KEM-1024 decaps |
194331 cycles |
195624 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks
Details
| Benchmark suite | Current: 12036a9 | Previous: db75353 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
59765 cycles |
59775 cycles |
1.00 |
ML-KEM-512 encaps |
67418 cycles |
67520 cycles |
1.00 |
ML-KEM-512 decaps |
86122 cycles |
86168 cycles |
1.00 |
ML-KEM-768 keypair |
97488 cycles |
97434 cycles |
1.00 |
ML-KEM-768 encaps |
110849 cycles |
110991 cycles |
1.00 |
ML-KEM-768 decaps |
138197 cycles |
138336 cycles |
1.00 |
ML-KEM-1024 keypair |
155074 cycles |
154826 cycles |
1.00 |
ML-KEM-1024 encaps |
172210 cycles |
172753 cycles |
1.00 |
ML-KEM-1024 decaps |
209438 cycles |
208701 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
CBMC Results (ML-KEM-1024)Full Results (191 proofs)
|
CBMC Results (ML-KEM-512)Full Results (191 proofs)
|
CBMC Results (ML-KEM-768)Full Results (191 proofs)
|
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks
Details
| Benchmark suite | Current: 12036a9 | Previous: db75353 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
50711 cycles |
50835 cycles |
1.00 |
ML-KEM-512 encaps |
59128 cycles |
58693 cycles |
1.01 |
ML-KEM-512 decaps |
75104 cycles |
74946 cycles |
1.00 |
ML-KEM-768 keypair |
86483 cycles |
86333 cycles |
1.00 |
ML-KEM-768 encaps |
94967 cycles |
95399 cycles |
1.00 |
ML-KEM-768 decaps |
117949 cycles |
119161 cycles |
0.99 |
ML-KEM-1024 keypair |
129328 cycles |
130734 cycles |
0.99 |
ML-KEM-1024 encaps |
141943 cycles |
143257 cycles |
0.99 |
ML-KEM-1024 decaps |
174407 cycles |
173329 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
Add a little-endian POWER (ppc64le) native backend targeting POWER9 / ISA 2.07 and above, implementing the forward NTT, inverse NTT, Montgomery reduction, and poly_tomont in vector assembly. Extend scripts/autogen and scripts/simpasm to support the ppc64le backend (including auto-generated consts), add the development sources under dev/ppc64le, and wire up CI and the cross toolchain. Co-authored-by: Basil Hess <bhe@zurich.ibm.com> Signed-off-by: Danny Tsen <dtsen@us.ibm.com>
Comments and renames; no semantic change.
- VSX register-naming primer; INTT VSR constant-stash explained.
- SAVE_REGS frame layout, byte-offset convention, INTT layer-4
zeta rewinds, barrett_fqmul_4x NTT/INTT asymmetry documented.
- barrett_reduce_4x entry-block: "Restore" -> "Materialize" (it
pulls VR copies from the VSR stash).
- Rename layer macros to CT_Butterfly_*_len{2,4} / GS_Butterfly_*
so names describe shape, not hard-coded layer numbers. Lift
AddSub_4x into the wrappers so all 7 layers share the same
explicit load->mul->addsub->store sequence.
- Rename Load/Write helpers for symmetry: LoadPermL{24,44},
PermStoreL{24,44}, Load_4_{High,Low,Both}, Write_4_{Low,High},
Write_8X.
- Extract SaveLow/RecoverLow/RestoreGlobals helpers in INTT.
- Introduce VS_*_STASH defines for the long-term VSR slots.
Auto-derived copies under mlkem/src/native/ppc64le synced in a follow-up.
Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Running the full CI on #1648