Skip to content

HOL-Light: add rej_uniform_eta proofs for AArch64#1040

Open
jakemas wants to merge 6 commits into
mainfrom
add-hol-light-rej-uniform-eta4
Open

HOL-Light: add rej_uniform_eta proofs for AArch64#1040
jakemas wants to merge 6 commits into
mainfrom
add-hol-light-rej-uniform-eta4

Conversation

@jakemas
Copy link
Copy Markdown
Contributor

@jakemas jakemas commented Apr 14, 2026

Resolves #924

Some (development) notes:

I've moved the theorems that may be shared with future x86 proof to mldsa_specs.ml. Should it occur that more could be shared during the development of the x86 proof, then we can pull what we need at that point.

The CI gives a nice break down on how long each of the CORRECT and MEMSAFE proofs take. I was able to optimize the full eta2 proof quite a bit from 1h7m to just 31m, which is great considering the CORRECT takes ~24min.

CORRECT proof:
HOL-Light / AArch64 HOL Light proof for rej_uniform_eta4_aarch64_asm.S (pull_request) Successful in 11m
HOL-Light / AArch64 HOL Light proof for rej_uniform_eta2_aarch64_asm.S (pull_request) Successful in 24m

CORRECT + MEMSAFE proof:
HOL-Light / AArch64 HOL Light proof for rej_uniform_eta4_aarch64_asm.S (pull_request) Successful in 21m
HOL-Light / AArch64 HOL Light proof for rej_uniform_eta2_aarch64_asm.S (pull_request) Successful in 1h7m

Optimizations CORRECT + MEMSAFE proof:
HOL-Light / AArch64 HOL Light proof for rej_uniform_eta4_aarch64_asm.S succeeded in 18m 40s
HOL-Light / AArch64 HOL Light proof for rej_uniform_eta2_aarch64_asm.S succeeded in 31m 29s

Built with Claude Opus 4.7 1m. Using the Hol-Light MCP. About 4 weeks of effort.

@jakemas jakemas requested a review from a team as a code owner April 14, 2026 21:41
@jakemas jakemas marked this pull request as draft April 14, 2026 21:41
@jakemas jakemas force-pushed the add-hol-light-rej-uniform-eta4 branch 7 times, most recently from a3673e9 to 0213d92 Compare April 14, 2026 22:00
@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 14, 2026

CBMC Results (ML-DSA-44)

Full Results (201 proofs)
Proof Status Current Previous Change
**TOTAL** 1850s 2043s -9.4%
mld_invntt_layer 254s 298s -15%
poly_pointwise_montgomery_c 155s 184s -16%
polyvecl_pointwise_acc_montgomery_c 153s 184s -17%
rej_uniform_native 136s 153s -11%
mld_attempt_signature_generation 68s 77s -12%
mld_ct_memcmp 67s 81s -17%
mld_ntt_layer 47s 53s -11%
fqmul 42s 47s -11%
polyvec_matrix_expand 32s 34s -6%
sign_verify_internal 28s 28s +0%
sign_signature_internal 24s 25s -4%
keccakf1600x4_permute_native 23s 25s -8%
rej_uniform 23s 22s +5%
rej_uniform_c 19s 20s -5%
compute_pack_t0_t1 18s 19s -5%
polyt0_unpack 18s 16s +12%
mld_check_pct 16s 17s -6%
poly_chknorm_c 15s 18s -17%
polyvec_matrix_pointwise_montgomery_yvec 15s 19s -21%
polyvecl_chknorm 15s 21s -29%
poly_uniform_4x 14s 14s +0%
poly_uniform_eta_4x 13s 14s -7%
polyz_unpack_c 13s 14s -7%
mld_ntt_butterfly_block 12s 14s -14%
poly_add 11s 12s -8%
polyeta_unpack 11s 15s -27%
polyvec_matrix_expand_serial 10s 11s -9%
keccak_absorb_once_x4 9s 10s -10%
mld_keccakf1600_permute_c 9s 7s +29%
mld_compute_pack_z 8s 10s -20%
polyveck_decompose 8s 7s +14%
keccak_absorb 7s 7s +0%
poly_invntt_tomont_c 7s 8s -12%
sign 7s 7s +0%
pointwise_acc_native_aarch64 6s 6s +0%
pointwise_acc_native_x86_64 6s 4s +50%
poly_decompose_c 6s 8s -25%
poly_power2round 6s 7s -14%
polyveck_caddq 6s 6s +0%
polyveck_chknorm 6s 7s -14%
polyveck_invntt_tomont 6s 4s +50%
polyvecl_unpack_z 6s 5s +20%
polyz_pack 6s 5s +20%
sign_pk_from_sk 6s 8s -25%
sign_signature_pre_hash_internal 6s 5s +20%
sign_verify_extmu 6s 4s +50%
keccakf1600x4_xor_bytes 5s 2s +150%
mld_ct_get_optblocker_u8 5s 2s +150%
pack_sk_rho_key_tr_s2 5s 4s +25%
pack_sk_s1 5s 3s +67%
pointwise_native_x86_64 5s 5s +0%
poly_caddq_c 5s 4s +25%
poly_chknorm_native_aarch64 5s 3s +67%
poly_permute_bitrev_to_custom_optional_native 5s 1s +400%
poly_pointwise_montgomery_native 5s 2s +150%
poly_uniform_eta 5s 7s -29%
poly_uniform_gamma1_4x 5s 6s -17%
poly_use_hint_native 5s 3s +67%
polyt1_unpack 5s 4s +25%
polyveck_ntt 5s 4s +25%
polyveck_pack_w1 5s 3s +67%
polyvecl_ntt 5s 3s +67%
rej_eta_c 5s 3s +67%
sign_keypair 5s 3s +67%
sign_keypair_internal 5s 5s +0%
sign_open 5s 5s +0%
use_hint 5s 4s +25%
caddq 4s 4s +0%
intt_native_x86_64 4s 2s +100%
keccak_finalize 4s 2s +100%
keccak_init 4s 6s -33%
keccakf1600_permute 4s 4s +0%
make_hint 4s 3s +33%
mld_ct_cmask_nonzero_u32 4s 5s -20%
mld_ct_get_optblocker_i64 4s 4s +0%
mld_keccakf1600x4_xor_bytes_c 4s 1s +300%
montgomery_reduce 4s 2s +100%
ntt_native_aarch64 4s 3s +33%
pointwise_native_aarch64 4s 3s +33%
poly_caddq_native 4s 3s +33%
poly_caddq_native_aarch64 4s 2s +100%
poly_decompose 4s 2s +100%
poly_use_hint 4s 3s +33%
polyeta_pack 4s 3s +33%
polyt1_pack 4s 1s +300%
polyvec_matrix_pointwise_montgomery_row 4s 2s +100%
polyveck_pack_eta 4s 3s +33%
polyz_unpack_native 4s 6s -33%
rej_eta_native 4s 6s -33%
shake128_squeeze 4s 2s +100%
shake256x4_squeezeblocks 4s 3s +33%
sign_signature 4s 4s +0%
sign_verify 4s 4s +0%
sign_verify_pre_hash_shake256 4s 3s +33%
unpack_sk_t0hat 4s 5s -20%
yvec_get_poly 4s 3s +33%
keccak_f1600_x4_native_aarch64_v84a 3s 3s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 3s 2s +50%
keccak_squeezeblocks_x4 3s 5s -40%
keccakf1600_permute_native 3s 3s +0%
keccakf1600_xor_bytes 3s 4s -25%
mld_ct_cmask_neg_i32 3s 2s +50%
mld_ct_cmask_nonzero_u8 3s 3s +0%
mld_ct_sel_int32 3s 3s +0%
mld_h 3s 4s -25%
mld_keccakf1600_extract_bytes 3s 3s +0%
mld_prepare_domain_separation_prefix 3s 3s +0%
mld_sample_s1_s2 3s 2s +50%
mld_value_barrier_i64 3s 2s +50%
mld_value_barrier_u8 3s 2s +50%
ntt_native_x86_64 3s 4s -25%
nttunpack_native_x86_64 3s 4s -25%
pack_sig_c 3s 2s +50%
pack_sig_z 3s 4s -25%
poly_caddq 3s 4s -25%
poly_challenge 3s 6s -50%
poly_chknorm 3s 4s -25%
poly_chknorm_native 3s 5s -40%
poly_decompose_native 3s 4s -25%
poly_invntt_tomont 3s 2s +50%
poly_ntt 3s 3s +0%
poly_ntt_native 3s 2s +50%
poly_permute_bitrev_to_custom_optional 3s 2s +50%
poly_sub 3s 3s +0%
poly_uniform 3s 3s +0%
poly_use_hint_native_aarch64 3s 4s -25%
polyveck_reduce 3s 3s +0%
polyveck_unpack_eta 3s 2s +50%
polyvecl_uniform_gamma1 3s 2s +50%
polyw1_pack 3s 2s +50%
polyz_unpack 3s 4s -25%
polyz_unpack_17_native_aarch64 3s 3s +0%
polyz_unpack_19_native_aarch64 3s 3s +0%
rej_eta 3s 2s +50%
rej_uniform_eta_native_aarch64 3s - new
rej_uniform_native_aarch64 3s 2s +50%
shake128_absorb 3s 3s +0%
shake128_init 3s 1s +200%
shake256_release 3s 2s +50%
sign_signature_pre_hash_shake256 3s 5s -40%
sk_s1hat_get_poly 3s 2s +50%
sk_s2hat_get_poly 3s 2s +50%
sk_t0hat_get_poly 3s 3s +0%
sys_check_capability 3s 2s +50%
unpack_sk 3s 3s +0%
unpack_sk_s1hat 3s 3s +0%
fqscale 2s 2s +0%
intt_native_aarch64 2s 2s +0%
keccak_f1600_x1_native_aarch64 2s 2s +0%
keccak_f1600_x1_native_aarch64_v84a 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 3s -33%
keccak_f1600_x4_native_avx2 2s 4s -50%
keccak_squeeze 2s 3s -33%
keccakf1600_extract_bytes (big endian) 2s 2s +0%
keccakf1600x4_extract_bytes 2s 3s -33%
keccakf1600x4_extract_bytes_native 2s 4s -50%
keccakf1600x4_permute 2s 2s +0%
keccakf1600x4_xor_bytes_native 2s 1s +100%
mld_ct_abs_i32 2s 1s +100%
mld_ct_get_optblocker_u32 2s 2s +0%
mld_keccakf1600x4_extract_bytes_c 2s 3s -33%
mld_polymat_expand_entry 2s 4s -50%
mld_sample_s1_s2_serial 2s 3s -33%
poly_caddq_native_x86_64 2s 5s -60%
poly_decompose_88_native_aarch64 2s 3s -33%
poly_ntt_c 2s 3s -33%
poly_reduce 2s 3s -33%
poly_shiftl 2s 5s -60%
poly_use_hint_c 2s 3s -33%
polyt0_pack 2s 4s -50%
polyvecl_pack_eta 2s 5s -60%
polyvecl_pointwise_acc_montgomery 2s 3s -33%
polyvecl_pointwise_acc_montgomery_native 2s 3s -33%
polyvecl_uniform_gamma1_serial 2s 5s -60%
polyvecl_unpack_eta 2s 3s -33%
power2round 2s 3s -33%
reduce32 2s 4s -50%
shake128_finalize 2s 2s +0%
shake128_release 2s 4s -50%
shake128x4_squeezeblocks 2s 2s +0%
shake256 2s 2s +0%
shake256_absorb 2s 2s +0%
shake256_finalize 2s 2s +0%
shake256_init 2s 4s -50%
shake256_squeeze 2s 2s +0%
sig_unpack_hints 2s 3s -33%
sign_signature_extmu 2s 3s -33%
sign_verify_pre_hash_internal 2s 5s -60%
unpack_pk_t1 2s 3s -33%
unpack_sk_s2hat 2s 3s -33%
yvec_init 2s 2s +0%
decompose 1s 3s -67%
keccakf1600_xor_bytes (big endian) 1s 3s -67%
mld_value_barrier_u32 1s 3s -67%
pack_sig_h 1s 4s -75%
poly_decompose_32_native_aarch64 1s 4s -75%
poly_invntt_tomont_native 1s 3s -67%
poly_pointwise_montgomery 1s 3s -67%
poly_uniform_gamma1 1s 2s -50%
shake128x4_absorb_once 1s 3s -67%
shake256x4_absorb_once 1s 5s -80%

@jakemas jakemas force-pushed the add-hol-light-rej-uniform-eta4 branch from 0213d92 to 4558665 Compare April 14, 2026 22:03
@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 14, 2026

CBMC Results (ML-DSA-65)

Full Results (201 proofs)
Proof Status Current Previous Change
**TOTAL** 2537s 2175s +16.6%
polyvecl_pointwise_acc_montgomery_c 419s 329s +27%
mld_invntt_layer 313s 262s +19%
poly_pointwise_montgomery_c 205s 165s +24%
polyvec_matrix_expand 173s 151s +15%
rej_uniform_native 159s 137s +16%
mld_ct_memcmp 86s 68s +26%
mld_attempt_signature_generation 59s 54s +9%
mld_ntt_layer 55s 47s +17%
fqmul 49s 44s +11%
sign_verify_internal 45s 37s +22%
sign_signature_internal 29s 25s +16%
polyvec_matrix_expand_serial 26s 23s +13%
polyvec_matrix_pointwise_montgomery_yvec 24s 20s +20%
keccakf1600x4_permute_native 23s 23s +0%
rej_uniform 23s 20s +15%
rej_uniform_c 20s 17s +18%
poly_chknorm_c 19s 16s +19%
compute_pack_t0_t1 18s 16s +12%
poly_uniform_eta_4x 17s 15s +13%
polyt0_unpack 17s 16s +6%
poly_uniform_4x 16s 14s +14%
mld_ntt_butterfly_block 14s 14s +0%
polyveck_decompose 14s 15s -7%
poly_add 13s 11s +18%
mld_check_pct 12s 11s +9%
polyveck_chknorm 12s 10s +20%
keccak_absorb_once_x4 11s 9s +22%
polyveck_caddq 11s 8s +38%
sign 11s 9s +22%
mld_compute_pack_z 10s 10s +0%
poly_invntt_tomont_c 10s 10s +0%
mld_keccakf1600_permute_c 9s 7s +29%
poly_decompose_c 9s 6s +50%
polyveck_invntt_tomont 9s 8s +12%
keccak_absorb 8s 8s +0%
poly_caddq_c 8s 4s +100%
poly_challenge 8s 5s +60%
polyveck_ntt 8s 9s -11%
mld_sample_s1_s2 7s 7s +0%
pointwise_acc_native_aarch64 7s 8s -12%
polyvecl_ntt 7s 6s +17%
sign_keypair_internal 7s 4s +75%
keccak_squeezeblocks_x4 6s 5s +20%
poly_power2round 6s 4s +50%
poly_uniform_eta 6s 4s +50%
polyvecl_unpack_eta 6s 2s +200%
sign_open 6s 6s +0%
sign_signature_extmu 6s 5s +20%
unpack_sk_s1hat 6s 3s +100%
caddq 5s 2s +150%
intt_native_x86_64 5s 4s +25%
keccakf1600x4_xor_bytes_native 5s 4s +25%
mld_h 5s 3s +67%
mld_prepare_domain_separation_prefix 5s 4s +25%
mld_sample_s1_s2_serial 5s 4s +25%
pack_sig_c 5s 2s +150%
pointwise_acc_native_x86_64 5s 5s +0%
pointwise_native_aarch64 5s 2s +150%
poly_chknorm_native 5s 2s +150%
poly_ntt_native 5s 2s +150%
poly_uniform_gamma1 5s 4s +25%
poly_use_hint_c 5s 4s +25%
polyvecl_chknorm 5s 5s +0%
polyvecl_pointwise_acc_montgomery_native 5s 1s +400%
polyvecl_uniform_gamma1_serial 5s 5s +0%
polyvecl_unpack_z 5s 2s +150%
rej_eta_native 5s 7s -29%
sign_keypair 5s 4s +25%
sign_pk_from_sk 5s 6s -17%
sign_signature 5s 4s +25%
sign_signature_pre_hash_shake256 5s 4s +25%
sign_verify_pre_hash_internal 5s 5s +0%
unpack_pk_t1 5s 2s +150%
unpack_sk_s2hat 5s 3s +67%
keccak_init 4s 2s +100%
keccakf1600x4_extract_bytes 4s 2s +100%
keccakf1600x4_extract_bytes_native 4s 5s -20%
keccakf1600x4_permute 4s 3s +33%
make_hint 4s 3s +33%
mld_ct_abs_i32 4s 3s +33%
mld_ct_sel_int32 4s 3s +33%
mld_value_barrier_i64 4s 2s +100%
mld_value_barrier_u32 4s 1s +300%
ntt_native_x86_64 4s 5s -20%
pack_sig_h 4s 2s +100%
pack_sk_s1 4s 4s +0%
poly_chknorm 4s 4s +0%
poly_decompose_32_native_aarch64 4s 3s +33%
poly_pointwise_montgomery_native 4s 3s +33%
poly_reduce 4s 3s +33%
poly_shiftl 4s 3s +33%
poly_uniform 4s 6s -33%
poly_uniform_gamma1_4x 4s 3s +33%
poly_use_hint 4s 3s +33%
poly_use_hint_native_aarch64 4s 2s +100%
polyt0_pack 4s 5s -20%
polyt1_pack 4s 4s +0%
polyt1_unpack 4s 4s +0%
polyveck_pack_eta 4s 2s +100%
polyveck_pack_w1 4s 4s +0%
polyvecl_pointwise_acc_montgomery 4s 3s +33%
polyvecl_uniform_gamma1 4s 5s -20%
polyz_pack 4s 3s +33%
polyz_unpack_c 4s 5s -20%
rej_eta_c 4s 3s +33%
rej_uniform_eta_native_aarch64 4s - new
rej_uniform_native_aarch64 4s 4s +0%
shake128_absorb 4s 3s +33%
shake128x4_squeezeblocks 4s 3s +33%
shake256 4s 3s +33%
shake256_absorb 4s 1s +300%
sign_verify 4s 6s -33%
sign_verify_extmu 4s 4s +0%
sign_verify_pre_hash_shake256 4s 5s -20%
unpack_sk 4s 2s +100%
unpack_sk_t0hat 4s 4s +0%
decompose 3s 3s +0%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 3s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 3s 2s +50%
keccak_squeeze 3s 3s +0%
keccakf1600_permute_native 3s 3s +0%
mld_ct_cmask_nonzero_u8 3s 4s -25%
mld_ct_get_optblocker_i64 3s 1s +200%
mld_keccakf1600x4_extract_bytes_c 3s 3s +0%
mld_keccakf1600x4_xor_bytes_c 3s 3s +0%
mld_value_barrier_u8 3s 4s -25%
ntt_native_aarch64 3s 3s +0%
nttunpack_native_x86_64 3s 4s -25%
pointwise_native_x86_64 3s 4s -25%
poly_caddq_native 3s 2s +50%
poly_caddq_native_aarch64 3s 2s +50%
poly_caddq_native_x86_64 3s 4s -25%
poly_chknorm_native_aarch64 3s 3s +0%
poly_decompose 3s 1s +200%
poly_decompose_88_native_aarch64 3s 4s -25%
poly_ntt 3s 2s +50%
poly_permute_bitrev_to_custom_optional_native 3s 4s -25%
poly_sub 3s 2s +50%
polyeta_unpack 3s 3s +0%
polyvec_matrix_pointwise_montgomery_row 3s 3s +0%
polyveck_unpack_eta 3s 2s +50%
polyvecl_pack_eta 3s 3s +0%
polyw1_pack 3s 5s -40%
polyz_unpack_17_native_aarch64 3s 3s +0%
polyz_unpack_19_native_aarch64 3s 2s +50%
power2round 3s 1s +200%
rej_eta 3s 2s +50%
shake128_init 3s 5s -40%
shake128x4_absorb_once 3s 3s +0%
shake256_init 3s 3s +0%
shake256_squeeze 3s 2s +50%
shake256x4_squeezeblocks 3s 3s +0%
sig_unpack_hints 3s 3s +0%
sign_signature_pre_hash_internal 3s 5s -40%
sys_check_capability 3s 3s +0%
use_hint 3s 2s +50%
fqscale 2s 1s +100%
intt_native_aarch64 2s 3s -33%
keccak_f1600_x1_native_aarch64 2s 3s -33%
keccak_f1600_x1_native_aarch64_v84a 2s 4s -50%
keccakf1600_extract_bytes (big endian) 2s 3s -33%
keccakf1600_xor_bytes 2s 4s -50%
keccakf1600_xor_bytes (big endian) 2s 2s +0%
keccakf1600x4_xor_bytes 2s 2s +0%
mld_ct_cmask_nonzero_u32 2s 4s -50%
mld_ct_get_optblocker_u32 2s 3s -33%
mld_ct_get_optblocker_u8 2s 3s -33%
mld_keccakf1600_extract_bytes 2s 2s +0%
mld_polymat_expand_entry 2s 2s +0%
montgomery_reduce 2s 4s -50%
pack_sig_z 2s 4s -50%
pack_sk_rho_key_tr_s2 2s 4s -50%
poly_caddq 2s 3s -33%
poly_decompose_native 2s 4s -50%
poly_invntt_tomont 2s 2s +0%
poly_invntt_tomont_native 2s 4s -50%
poly_ntt_c 2s 1s +100%
poly_permute_bitrev_to_custom_optional 2s 3s -33%
poly_pointwise_montgomery 2s 2s +0%
poly_use_hint_native 2s 3s -33%
polyeta_pack 2s 3s -33%
polyveck_reduce 2s 2s +0%
polyz_unpack 2s 2s +0%
polyz_unpack_native 2s 4s -50%
reduce32 2s 4s -50%
shake256_finalize 2s 3s -33%
shake256_release 2s 3s -33%
shake256x4_absorb_once 2s 2s +0%
sk_s1hat_get_poly 2s 3s -33%
sk_s2hat_get_poly 2s 4s -50%
sk_t0hat_get_poly 2s 2s +0%
yvec_get_poly 2s 2s +0%
yvec_init 2s 2s +0%
keccak_f1600_x4_native_aarch64_v84a 1s 2s -50%
keccak_f1600_x4_native_avx2 1s 1s +0%
keccak_finalize 1s 2s -50%
keccakf1600_permute 1s 2s -50%
mld_ct_cmask_neg_i32 1s 2s -50%
shake128_finalize 1s 3s -67%
shake128_release 1s 3s -67%
shake128_squeeze 1s 2s -50%

@jakemas jakemas force-pushed the add-hol-light-rej-uniform-eta4 branch from 4558665 to 73d8d61 Compare April 14, 2026 22:14
@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 14, 2026

CBMC Results (ML-DSA-87)

Full Results (201 proofs)
Proof Status Current Previous Change
**TOTAL** 2237s 2720s -17.8%
polyvecl_pointwise_acc_montgomery_c 265s 390s -32%
mld_invntt_layer 253s 323s -22%
polyvec_matrix_expand 176s 213s -17%
poly_pointwise_montgomery_c 157s 228s -31%
rej_uniform_native 138s 161s -14%
sign_verify_internal 92s 105s -12%
mld_attempt_signature_generation 71s 78s -9%
mld_ct_memcmp 70s 93s -25%
mld_ntt_layer 49s 52s -6%
fqmul 41s 54s -24%
polyvec_matrix_expand_serial 38s 43s -12%
compute_pack_t0_t1 33s 38s -13%
sign_signature_internal 32s 37s -14%
rej_uniform 23s 23s +0%
keccakf1600x4_permute_native 21s 25s -16%
poly_chknorm_c 17s 19s -11%
rej_uniform_c 17s 22s -23%
poly_uniform_eta_4x 16s 16s +0%
polyt0_unpack 16s 20s -20%
polyvec_matrix_pointwise_montgomery_yvec 15s 14s +7%
polyveck_decompose 14s 12s +17%
mld_ntt_butterfly_block 12s 16s -25%
poly_uniform_4x 12s 14s -14%
polyeta_unpack 11s 14s -21%
polyveck_invntt_tomont 11s 10s +10%
mld_check_pct 10s 13s -23%
poly_add 10s 14s -29%
polyveck_caddq 10s 8s +25%
unpack_sk_t0hat 10s 8s +25%
keccak_absorb_once_x4 9s 11s -18%
mld_compute_pack_z 9s 8s +12%
mld_sample_s1_s2_serial 9s 8s +12%
sign 9s 10s -10%
sign_pk_from_sk 9s 10s -10%
poly_invntt_tomont_c 8s 10s -20%
keccak_absorb 7s 7s +0%
keccak_squeezeblocks_x4 7s 6s +17%
poly_power2round 7s 9s -22%
polyvecl_ntt 7s 10s -30%
intt_native_aarch64 6s 3s +100%
keccakf1600x4_permute 6s 2s +200%
mld_ct_cmask_nonzero_u32 6s 4s +50%
mld_keccakf1600_permute_c 6s 8s -25%
pointwise_acc_native_aarch64 6s 9s -33%
pointwise_acc_native_x86_64 6s 7s -14%
poly_challenge 6s 6s +0%
polyt0_pack 6s 5s +20%
polyvecl_pack_eta 6s 3s +100%
polyz_unpack_c 6s 7s -14%
mld_ct_cmask_nonzero_u8 5s 2s +150%
mld_prepare_domain_separation_prefix 5s 6s -17%
poly_caddq_c 5s 4s +25%
poly_chknorm_native_aarch64 5s 3s +67%
poly_permute_bitrev_to_custom_optional_native 5s 2s +150%
poly_sub 5s 3s +67%
polyveck_chknorm 5s 7s -29%
polyveck_unpack_eta 5s 4s +25%
polyvecl_chknorm 5s 5s +0%
polyvecl_pointwise_acc_montgomery_native 5s 7s -29%
rej_eta_native 5s 5s +0%
shake128x4_squeezeblocks 5s 2s +150%
shake256_finalize 5s 3s +67%
sign_keypair_internal 5s 4s +25%
sign_signature_pre_hash_shake256 5s 8s -38%
sign_verify_pre_hash_internal 5s 4s +25%
use_hint 5s 3s +67%
fqscale 4s 6s -33%
keccakf1600_xor_bytes (big endian) 4s 1s +300%
keccakf1600x4_xor_bytes 4s 3s +33%
mld_keccakf1600_extract_bytes 4s 1s +300%
mld_polymat_expand_entry 4s 3s +33%
mld_sample_s1_s2 4s 6s -33%
pack_sk_rho_key_tr_s2 4s 3s +33%
poly_caddq_native 4s 6s -33%
poly_caddq_native_x86_64 4s 3s +33%
poly_chknorm_native 4s 5s -20%
poly_decompose_c 4s 7s -43%
poly_decompose_native 4s 3s +33%
poly_invntt_tomont 4s 2s +100%
poly_invntt_tomont_native 4s 4s +0%
poly_ntt 4s 4s +0%
poly_ntt_native 4s 5s -20%
poly_permute_bitrev_to_custom_optional 4s 2s +100%
poly_shiftl 4s 4s +0%
poly_uniform_gamma1_4x 4s 5s -20%
poly_use_hint_c 4s 8s -50%
polyveck_ntt 4s 4s +0%
polyvecl_pointwise_acc_montgomery 4s 3s +33%
polyw1_pack 4s 5s -20%
polyz_unpack 4s 6s -33%
polyz_unpack_17_native_aarch64 4s 3s +33%
shake128x4_absorb_once 4s 3s +33%
sign_keypair 4s 6s -33%
sign_signature 4s 3s +33%
sign_signature_extmu 4s 3s +33%
sign_signature_pre_hash_internal 4s 6s -33%
sign_verify 4s 6s -33%
sign_verify_pre_hash_shake256 4s 6s -33%
sk_s2hat_get_poly 4s 3s +33%
unpack_pk_t1 4s 3s +33%
unpack_sk_s1hat 4s 5s -20%
caddq 3s 3s +0%
decompose 3s 2s +50%
intt_native_x86_64 3s 3s +0%
keccak_f1600_x1_native_aarch64_v84a 3s 2s +50%
keccak_f1600_x4_native_aarch64_v84a 3s 2s +50%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 3s +0%
keccakf1600_extract_bytes (big endian) 3s 3s +0%
keccakf1600_permute 3s 2s +50%
keccakf1600_permute_native 3s 2s +50%
keccakf1600_xor_bytes 3s 2s +50%
keccakf1600x4_extract_bytes 3s 4s -25%
keccakf1600x4_xor_bytes_native 3s 2s +50%
mld_ct_get_optblocker_i64 3s 3s +0%
mld_ct_get_optblocker_u32 3s 2s +50%
mld_ct_sel_int32 3s 3s +0%
mld_keccakf1600x4_extract_bytes_c 3s 2s +50%
mld_value_barrier_u32 3s 4s -25%
ntt_native_x86_64 3s 2s +50%
pack_sig_h 3s 2s +50%
pointwise_native_aarch64 3s 3s +0%
pointwise_native_x86_64 3s 2s +50%
poly_chknorm 3s 4s -25%
poly_decompose 3s 3s +0%
poly_decompose_32_native_aarch64 3s 5s -40%
poly_pointwise_montgomery 3s 2s +50%
poly_reduce 3s 2s +50%
poly_uniform 3s 4s -25%
poly_use_hint 3s 3s +0%
poly_use_hint_native 3s 2s +50%
poly_use_hint_native_aarch64 3s 5s -40%
polyvecl_uniform_gamma1_serial 3s 3s +0%
polyz_unpack_19_native_aarch64 3s 2s +50%
reduce32 3s 1s +200%
rej_eta 3s 3s +0%
rej_eta_c 3s 6s -50%
rej_uniform_eta_native_aarch64 3s - new
rej_uniform_native_aarch64 3s 3s +0%
shake128_finalize 3s 1s +200%
shake128_squeeze 3s 3s +0%
shake256_absorb 3s 4s -25%
shake256x4_squeezeblocks 3s 2s +50%
sig_unpack_hints 3s 3s +0%
sign_open 3s 4s -25%
sign_verify_extmu 3s 6s -50%
unpack_sk 3s 5s -40%
yvec_get_poly 3s 4s -25%
keccakf1600x4_extract_bytes_native 2s 2s +0%
make_hint 2s 2s +0%
mld_ct_cmask_neg_i32 2s 2s +0%
mld_ct_get_optblocker_u8 2s 2s +0%
mld_h 2s 7s -71%
mld_keccakf1600x4_xor_bytes_c 2s 3s -33%
mld_value_barrier_i64 2s 3s -33%
ntt_native_aarch64 2s 5s -60%
nttunpack_native_x86_64 2s 2s +0%
pack_sig_c 2s 5s -60%
pack_sig_z 2s 5s -60%
pack_sk_s1 2s 5s -60%
poly_caddq 2s 5s -60%
poly_caddq_native_aarch64 2s 4s -50%
poly_ntt_c 2s 4s -50%
poly_pointwise_montgomery_native 2s 3s -33%
poly_uniform_eta 2s 2s +0%
poly_uniform_gamma1 2s 3s -33%
polyeta_pack 2s 3s -33%
polyt1_pack 2s 2s +0%
polyveck_pack_eta 2s 2s +0%
polyveck_pack_w1 2s 3s -33%
polyvecl_uniform_gamma1 2s 6s -67%
polyvecl_unpack_eta 2s 4s -50%
polyvecl_unpack_z 2s 6s -67%
polyz_pack 2s 2s +0%
polyz_unpack_native 2s 2s +0%
power2round 2s 3s -33%
shake128_init 2s 2s +0%
shake128_release 2s 2s +0%
shake256 2s 4s -50%
shake256_init 2s 3s -33%
shake256_release 2s 2s +0%
shake256x4_absorb_once 2s 2s +0%
sk_s1hat_get_poly 2s 2s +0%
sk_t0hat_get_poly 2s 2s +0%
sys_check_capability 2s 5s -60%
unpack_sk_s2hat 2s 2s +0%
yvec_init 2s 3s -33%
keccak_f1600_x1_native_aarch64 1s 3s -67%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 1s 2s -50%
keccak_f1600_x4_native_avx2 1s 2s -50%
keccak_finalize 1s 1s +0%
keccak_init 1s 3s -67%
keccak_squeeze 1s 2s -50%
mld_ct_abs_i32 1s 2s -50%
mld_value_barrier_u8 1s 3s -67%
montgomery_reduce 1s 1s +0%
poly_decompose_88_native_aarch64 1s 4s -75%
polyt1_unpack 1s 4s -75%
polyvec_matrix_pointwise_montgomery_row 1s 2s -50%
polyveck_reduce 1s 4s -75%
shake128_absorb 1s 2s -50%
shake256_squeeze 1s 2s -50%

@jakemas jakemas force-pushed the add-hol-light-rej-uniform-eta4 branch from 73d8d61 to a7cc582 Compare April 15, 2026 11:06
@jakemas jakemas changed the title Add HOL Light rej_uniform_eta4 proof for AArch64 Add HOL Light rej_uniform_eta proofs for AArch64 Apr 15, 2026
@jakemas jakemas force-pushed the add-hol-light-rej-uniform-eta4 branch 9 times, most recently from 77cb935 to f986df8 Compare April 15, 2026 17:09
@mkannwischer mkannwischer self-assigned this Apr 22, 2026
@jakemas jakemas force-pushed the add-hol-light-rej-uniform-eta4 branch 3 times, most recently from 37dfafd to ddd97de Compare April 24, 2026 20:10
@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 24, 2026

CBMC Results (ML-DSA-44, REDUCE-RAM)

Full Results (201 proofs)
Proof Status Current Previous Change
**TOTAL** 1860s 1646s +13.0%
poly_pointwise_montgomery_c 237s 185s +28%
polyvec_matrix_pointwise_montgomery_yvec 216s 167s +29%
mld_invntt_layer 211s 180s +17%
rej_uniform_native 130s 114s +14%
mld_ct_memcmp 85s 68s +25%
mld_ntt_layer 51s 45s +13%
fqmul 47s 41s +15%
sign_verify_internal 27s 21s +29%
mld_attempt_signature_generation 25s 21s +19%
keccakf1600x4_permute_native 22s 24s -8%
polyt0_unpack 21s 19s +11%
polyeta_unpack 18s 17s +6%
rej_uniform_c 17s 15s +13%
polyz_unpack_c 16s 14s +14%
mld_check_pct 15s 13s +15%
poly_chknorm_c 15s 14s +7%
mld_ntt_butterfly_block 13s 14s -7%
poly_invntt_tomont_c 13s 8s +62%
poly_uniform_eta_4x 13s 12s +8%
rej_uniform 13s 11s +18%
poly_add 12s 12s +0%
polyveck_chknorm 12s 13s -8%
keccak_absorb_once_x4 10s 11s -9%
sign 10s 7s +43%
mld_compute_pack_z 9s 6s +50%
poly_decompose_c 9s 7s +29%
poly_shiftl 9s 5s +80%
polyvec_matrix_pointwise_montgomery_row 9s 7s +29%
compute_pack_t0_t1 8s 11s -27%
mld_keccakf1600_permute_c 8s 6s +33%
pointwise_acc_native_x86_64 8s 6s +33%
poly_caddq_c 8s 7s +14%
poly_power2round 8s 8s +0%
keccak_absorb 6s 6s +0%
mld_sample_s1_s2_serial 6s 3s +100%
pointwise_acc_native_aarch64 6s 7s -14%
poly_uniform_gamma1 6s 3s +100%
polyveck_decompose 6s 3s +100%
polyveck_reduce 6s 8s -25%
polyvecl_ntt 6s 5s +20%
sign_pk_from_sk 6s 6s +0%
sign_signature_internal 6s 8s -25%
mld_prepare_domain_separation_prefix 5s 5s +0%
mld_sample_s1_s2 5s 5s +0%
ntt_native_aarch64 5s 3s +67%
nttunpack_native_x86_64 5s 4s +25%
pack_sig_h 5s 6s -17%
pack_sig_z 5s 3s +67%
poly_caddq_native_aarch64 5s 4s +25%
poly_chknorm_native 5s 4s +25%
poly_use_hint_native 5s 2s +150%
polyvecl_uniform_gamma1_serial 5s 5s +0%
polyz_unpack 5s 2s +150%
rej_eta_c 5s 2s +150%
rej_uniform_eta_native_aarch64 5s - new
rej_uniform_native_aarch64 5s 3s +67%
sig_unpack_hints 5s 4s +25%
sign_signature 5s 3s +67%
decompose 4s 3s +33%
intt_native_aarch64 4s 3s +33%
keccak_init 4s 5s -20%
keccak_squeeze 4s 2s +100%
keccakf1600_permute 4s 2s +100%
mld_ct_cmask_neg_i32 4s 5s -20%
mld_ct_cmask_nonzero_u8 4s 3s +33%
mld_ct_sel_int32 4s 3s +33%
mld_keccakf1600x4_xor_bytes_c 4s 2s +100%
pack_sig_c 4s 4s +0%
pack_sk_s1 4s 3s +33%
poly_chknorm_native_aarch64 4s 2s +100%
poly_invntt_tomont 4s 4s +0%
poly_ntt_c 4s 5s -20%
poly_permute_bitrev_to_custom_optional_native 4s 3s +33%
poly_sub 4s 3s +33%
polyeta_pack 4s 2s +100%
polyveck_caddq 4s 4s +0%
polyvecl_chknorm 4s 6s -33%
polyvecl_uniform_gamma1 4s 3s +33%
polyvecl_unpack_z 4s 2s +100%
polyz_unpack_19_native_aarch64 4s 3s +33%
rej_eta_native 4s 3s +33%
shake128_init 4s 2s +100%
shake256_init 4s 4s +0%
sign_keypair 4s 3s +33%
sign_keypair_internal 4s 5s -20%
sign_open 4s 6s -33%
sign_signature_extmu 4s 6s -33%
sign_verify_extmu 4s 3s +33%
sign_verify_pre_hash_internal 4s 2s +100%
sk_s1hat_get_poly 4s 3s +33%
sk_s2hat_get_poly 4s 3s +33%
sys_check_capability 4s 1s +300%
unpack_sk_s1hat 4s 2s +100%
caddq 3s 3s +0%
intt_native_x86_64 3s 8s -62%
keccak_f1600_x1_native_aarch64 3s 3s +0%
keccak_f1600_x1_native_aarch64_v84a 3s 2s +50%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 2s +50%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 3s 2s +50%
keccak_finalize 3s 3s +0%
keccak_squeezeblocks_x4 3s 3s +0%
keccakf1600_extract_bytes (big endian) 3s 2s +50%
keccakf1600_permute_native 3s 2s +50%
keccakf1600_xor_bytes 3s 3s +0%
keccakf1600x4_permute 3s 4s -25%
keccakf1600x4_xor_bytes 3s 2s +50%
make_hint 3s 3s +0%
mld_ct_get_optblocker_u8 3s 5s -40%
mld_h 3s 3s +0%
mld_polymat_expand_entry 3s 3s +0%
montgomery_reduce 3s 2s +50%
ntt_native_x86_64 3s 2s +50%
pack_sk_rho_key_tr_s2 3s 5s -40%
pointwise_native_aarch64 3s 2s +50%
poly_caddq_native 3s 3s +0%
poly_chknorm 3s 5s -40%
poly_decompose 3s 1s +200%
poly_decompose_32_native_aarch64 3s 3s +0%
poly_ntt_native 3s 2s +50%
poly_pointwise_montgomery 3s 4s -25%
poly_uniform 3s 5s -40%
poly_uniform_eta 3s 3s +0%
poly_uniform_gamma1_4x 3s 1s +200%
poly_use_hint_c 3s 2s +50%
polyt0_pack 3s 6s -50%
polyvec_matrix_expand 3s 2s +50%
polyveck_invntt_tomont 3s 4s -25%
polyvecl_pointwise_acc_montgomery_c 3s 2s +50%
polyvecl_pointwise_acc_montgomery_native 3s 2s +50%
polyvecl_unpack_eta 3s 1s +200%
polyz_pack 3s 5s -40%
power2round 3s 3s +0%
reduce32 3s 1s +200%
shake128_finalize 3s 2s +50%
shake128x4_squeezeblocks 3s 2s +50%
shake256_absorb 3s 2s +50%
shake256_finalize 3s 2s +50%
shake256_release 3s 1s +200%
shake256x4_absorb_once 3s 3s +0%
sign_signature_pre_hash_internal 3s 4s -25%
sign_verify 3s 4s -25%
sk_t0hat_get_poly 3s 4s -25%
unpack_pk_t1 3s 4s -25%
unpack_sk 3s 2s +50%
unpack_sk_s2hat 3s 2s +50%
unpack_sk_t0hat 3s 4s -25%
use_hint 3s 4s -25%
yvec_get_poly 3s 2s +50%
yvec_init 3s 1s +200%
fqscale 2s 3s -33%
keccak_f1600_x4_native_avx2 2s 2s +0%
keccakf1600_xor_bytes (big endian) 2s 2s +0%
keccakf1600x4_extract_bytes 2s 4s -50%
keccakf1600x4_extract_bytes_native 2s 2s +0%
mld_ct_abs_i32 2s 2s +0%
mld_ct_cmask_nonzero_u32 2s 4s -50%
mld_ct_get_optblocker_i64 2s 2s +0%
mld_ct_get_optblocker_u32 2s 3s -33%
mld_keccakf1600_extract_bytes 2s 3s -33%
mld_keccakf1600x4_extract_bytes_c 2s 1s +100%
mld_value_barrier_u32 2s 2s +0%
mld_value_barrier_u8 2s 3s -33%
pointwise_native_x86_64 2s 4s -50%
poly_caddq_native_x86_64 2s 3s -33%
poly_challenge 2s 4s -50%
poly_decompose_88_native_aarch64 2s 4s -50%
poly_decompose_native 2s 6s -67%
poly_ntt 2s 3s -33%
poly_permute_bitrev_to_custom_optional 2s 4s -50%
poly_pointwise_montgomery_native 2s 2s +0%
poly_reduce 2s 2s +0%
poly_uniform_4x 2s 3s -33%
poly_use_hint 2s 3s -33%
poly_use_hint_native_aarch64 2s 3s -33%
polyt1_pack 2s 2s +0%
polyt1_unpack 2s 2s +0%
polyvec_matrix_expand_serial 2s 3s -33%
polyveck_ntt 2s 4s -50%
polyveck_pack_eta 2s 3s -33%
polyveck_pack_w1 2s 6s -67%
polyveck_unpack_eta 2s 2s +0%
polyvecl_pack_eta 2s 2s +0%
polyvecl_pointwise_acc_montgomery 2s 5s -60%
polyw1_pack 2s 2s +0%
polyz_unpack_17_native_aarch64 2s 3s -33%
shake128_absorb 2s 1s +100%
shake128_release 2s 2s +0%
shake256x4_squeezeblocks 2s 4s -50%
sign_signature_pre_hash_shake256 2s 6s -67%
sign_verify_pre_hash_shake256 2s 3s -33%
keccak_f1600_x4_native_aarch64_v84a 1s 1s +0%
keccakf1600x4_xor_bytes_native 1s 1s +0%
mld_value_barrier_i64 1s 3s -67%
poly_caddq 1s 4s -75%
poly_invntt_tomont_native 1s 2s -50%
polyz_unpack_native 1s 2s -50%
rej_eta 1s 5s -80%
shake128_squeeze 1s 2s -50%
shake128x4_absorb_once 1s 5s -80%
shake256 1s 5s -80%
shake256_squeeze 1s 2s -50%

@jakemas jakemas force-pushed the add-hol-light-rej-uniform-eta4 branch 5 times, most recently from 7429b71 to 0c8ed0a Compare May 22, 2026 00:03
@jakemas jakemas marked this pull request as ready for review May 22, 2026 01:54
@jakemas jakemas force-pushed the add-hol-light-rej-uniform-eta4 branch from 0c8ed0a to 63f752b Compare May 22, 2026 18:30
@jakemas
Copy link
Copy Markdown
Contributor Author

jakemas commented May 29, 2026

@mkannwischer @hanno-becker Ready for review

Copy link
Copy Markdown
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jakemas for finishing the last AArch64 proof - exciting!!
I found a couple of mistakes.

Comment thread dev/aarch64_opt/src/arith_native_aarch64.h Outdated
Comment thread dev/aarch64_opt/src/arith_native_aarch64.h Outdated
Comment thread dev/aarch64_opt/src/arith_native_aarch64.h
Comment thread proofs/hol_light/aarch64/mldsa/rej_uniform_eta2_aarch64_asm.S
Comment thread proofs/hol_light/aarch64/proofs/rej_uniform_eta2_aarch64_asm.ml Outdated
Comment thread dev/aarch64_opt/src/arith_native_aarch64.h Outdated
Comment thread proofs/hol_light/aarch64/proofs/rej_uniform_eta2_aarch64_asm.ml Outdated
Comment thread proofs/hol_light/aarch64/proofs/rej_uniform_eta2_aarch64_asm.ml
@jakemas jakemas force-pushed the add-hol-light-rej-uniform-eta4 branch 2 times, most recently from 0320606 to 0541f44 Compare June 1, 2026 20:14
@jakemas jakemas changed the title Add HOL Light rej_uniform_eta proofs for AArch64 HOL-Light: add rej_uniform_eta proofs for AArch64 Jun 1, 2026
Copy link
Copy Markdown
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jakemas. Looks good to me now.

(stackpointer,576)]
==> ensures arm
(\s. aligned_bytes_loaded s (word pc) mldsa_rej_uniform_eta2_mc /\
read PC s = word(pc + 4) /\
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For new proofs, let's avoid those hardcoded offsets and instead use the established pattern of introducing named offsets for preamble/postamble. I know not all proofs do this yet, but let's not make it worse.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — added MLDSA_REJ_UNIFORM_ETA{2,4}_{PREAMBLE,POSTAMBLE}_LENGTH, _CORE_START, _CORE_END and a LENGTH_SIMPLIFY_CONV to unfold them. The hardcoded pc + 4 / pc + 364 (eta2) / pc + 336 (eta4) are replaced with named offsets in _CORRECT / _SUBROUTINE_CORRECT / _MEMSAFE / _SUBROUTINE_MEMSAFE. Left the internal loop-body offsets (pc + 256, pc + 108, etc.) as numerals since they're block-internal rather than preamble/postamble boundaries.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

requires(table == mld_rej_uniform_eta_table)
assigns(memory_slice(r, sizeof(int32_t) * MLDSA_N))
ensures(return_value <= MLDSA_N)
ensures(array_abs_bound(r, 0, return_value, MLDSA_ETA + 1))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a hardcoded 3 as in the HOL-Light spec. The ASM is level-specific, so the specs should not contain level-dependent macros.

Copy link
Copy Markdown
Contributor Author

@jakemas jakemas Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded 3 for eta2 with a check-magic comment.

requires(memory_no_alias(buf, buflen))
requires(table == mld_rej_uniform_eta_table)
assigns(memory_slice(r, sizeof(int32_t) * MLDSA_N))
ensures(return_value <= MLDSA_N)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clause, while a trivial consequence, is not part of the HOL-Light spec

Copy link
Copy Markdown
Contributor Author

@jakemas jakemas Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added outlen <= 256 to the HOL-Light _SUBROUTINE_CORRECT postcondition so it now establishes the same bound (return_value <= MLDSA_N) that CBMC requires. Also hardcoded the literal bound (5 for eta4, 3 for eta2) since the asm is level-specific.

* in mldsa/src/native/aarch64/src/arith_native_aarch64.h *)

let MLDSA_REJ_UNIFORM_ETA4_SUBROUTINE_CORRECT = prove
(`!res buf buflen table (inlist:byte list) pc stackpointer returnaddress.
Copy link
Copy Markdown
Contributor

@hanno-becker hanno-becker Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocker: This is not the right spec. Compare

(`!res buf buflen table (inlist:(24 word)list) pc stackpointer returnaddress.
. The inlist here is naturally a 4 word list, and the output is a filter on exactly that list. Here, instead, REJ_SAMPLE_ETA4 does nibble unpacking logic, which it shouldn't.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for eta2 of course.

Copy link
Copy Markdown
Contributor Author

@jakemas jakemas Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REJ_SAMPLE_ETA{2,4} now takes a nibble list (typed int16 list, since that's how nibbles already live in SIMD lanes) and is just MAP/FILTER. The byte→nibble unpacking is applied at the subroutine spec via REJ_SAMPLE_ETA{2,4} (NIBBLES_OF_BYTES inlist). Internally the proof body still works against a private REJ_SAMPLE_ETA{2,4}_BYTES alias bridged by a one-line lemma — kept it that way to avoid rewriting the loop machinery, which peels off 8 bytes / 16 nibbles per iteration. Is that what you had in mind?

Copy link
Copy Markdown
Contributor

@hanno-becker hanno-becker Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jakemas!

It's not yet what I had in mind: The most natural formulations of rejection sampling -- and the one used so far -- expresses the specification at the natural bitwidth. Here, the input nibbles are 4 bit wide, so the inlist should be 4 word list. It is beautiful property of John's setup that the machinery is completely agnostic to the bitlength here, and can express the splitting of the input range into a list of 4 words without problem.

That the algorithm internally needs to load larger chunks at a time, and split them, is not visible at the spec level, and should not be.

So I am looking for inlist : 4 word list and REJ_SAMPLE_ETA[2|4] (l:4 word list) -> 32 word list.

Does that make sense?

It should be of little significance to the proof, I hope -- you mostly need to move the logic of NIBBLE_PAIR into the proof somehow.

Copy link
Copy Markdown
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec is not yet in the right shape, see comment and precedent of rejection sampling. The CBMC specs should not use level-dependent macros.

Finally: I realize we're living in a world where agents write proofs and their readability is becoming less important, but the proofs strike me as extremely long and lacking any commentary helping a human understand what's happening. As the agent reworks the proofs, please nudge it to a) write as compact as possible, b) add some high-level comments on the high-level steps. In particular, the proof contains multiple steps which strike me as inlining lemmas that should either be hoisted out or already be present in general form, such as

SUBGOAL_THEN
       `SUB_LIST(8 * i, 8) (inlist:byte list) =
        [word_subword (loaded_d:int64) (0,8):byte;
         word_subword loaded_d (8,8);
         word_subword loaded_d (16,8);
         word_subword loaded_d (24,8);
         word_subword loaded_d (32,8);
         word_subword loaded_d (40,8);
         word_subword loaded_d (48,8);
         word_subword loaded_d (56,8)]`

I'd also be surprised if something like

let WORD_SUBWORD_OF_JOIN_8X16 = BITBLAST_RULE
 `word_subword
  (word_join
   (word_join (word_join (h7:int16) (h6:int16):int32) (word_join (h5:int16) (h4:int16):int32):int64)
   (word_join (word_join (h3:int16) (h2:int16):int32) (word_join (h1:int16) (h0:int16):int32):int64):int128)
   (0,16):int16 = h0 /\
  word_subword
  (word_join
   (word_join (word_join (h7:int16) (h6:int16):int32) (word_join (h5:int16) (h4:int16):int32):int64)
   (word_join (word_join (h3:int16) (h2:int16):int32) (word_join (h1:int16) (h0:int16):int32):int64):int128)
   (16,16):int16 = h1 /\
  word_subword
  (word_join
   (word_join (word_join (h7:int16) (h6:int16):int32) (word_join (h5:int16) (h4:int16):int32):int64)
   (word_join (word_join (h3:int16) (h2:int16):int32) (word_join (h1:int16) (h0:int16):int32):int64):int128)
   (32,16):int16 = h2 /\
  word_subword
  (word_join
   (word_join (word_join (h7:int16) (h6:int16):int32) (word_join (h5:int16) (h4:int16):int32):int64)
   (word_join (word_join (h3:int16) (h2:int16):int32) (word_join (h1:int16) (h0:int16):int32):int64):int128)
   (48,16):int16 = h3 /\
  word_subword
  (word_join
   (word_join (word_join (h7:int16) (h6:int16):int32) (word_join (h5:int16) (h4:int16):int32):int64)
   (word_join (word_join (h3:int16) (h2:int16):int32) (word_join (h1:int16) (h0:int16):int32):int64):int128)
   (64,16):int16 = h4 /\
  word_subword
  (word_join
   (word_join (word_join (h7:int16) (h6:int16):int32) (word_join (h5:int16) (h4:int16):int32):int64)
   (word_join (word_join (h3:int16) (h2:int16):int32) (word_join (h1:int16) (h0:int16):int32):int64):int128)
   (80,16):int16 = h5 /\
  word_subword
  (word_join
   (word_join (word_join (h7:int16) (h6:int16):int32) (word_join (h5:int16) (h4:int16):int32):int64)
   (word_join (word_join (h3:int16) (h2:int16):int32) (word_join (h1:int16) (h0:int16):int32):int64):int128)
   (96,16):int16 = h6 /\
  word_subword
  (word_join
   (word_join (word_join (h7:int16) (h6:int16):int32) (word_join (h5:int16) (h4:int16):int32):int64)
   (word_join (word_join (h3:int16) (h2:int16):int32) (word_join (h1:int16) (h0:int16):int32):int64):int128)
   (112,16):int16 = h7`;;

was really necessary -- please look into existing split/merge tactics.

This is not only useful for humans, but also for agents: They will have an easier time learning from and adjusting the proof if it is well-structured, and common material is hoisted out. And, like humans, they typically need prodding to turn some proof into a good proof. So let's not stop at the first proof we find but use the power of the agent to have it write a good proof before committing. Nudge the agent, but also use your own judgement and go over the proof with human eyes and see if you can follow the structure and whether there are dubious steps or steps that ought to be hoisted out into / reused from general infrastructure.

Copy link
Copy Markdown
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some clarification on my previous review.

First, I should have said: Thank you @jakemas for getting this to work in the first place. Experience tells that rejection sampling is notoriously difficult, so having a working proof -- regardless of any opportunity for further improvement -- is fantastic. I should have paused before putting my review-cap on and acknowledged this -- apologies I didn't!

Now to the specific feedback. First, the following I think would be good to address before merge:

  • Removing level-dependent constants from the CBMC specs for ASM kernels.
  • Adding an explicit length clause to the HOL-Light subroutine spec, to match the respective clause in the CBMC spec.
  • Adjusting the functional spec we're proving against to work with nibble lists rather than byte lists, so that the top-level spec of rejection sampling is just a plain filter. This makes for the simplest and most intuitive spec.

Things I'd like us to consider, but not necessarily now:

  • Avoiding using hardcoded offsets for preamble and postamble. I believe we do this consistently in mlkem-native, but inconsistently in mldsa-native. If it's easy to do, let's do it -- but if not, we can wait for when we address this uniformly.
  • Proof cleanup. As done e.g. in #113, one can defer this. What's most important is that the specs are correct and aligned and that the proof runs in acceptable time -- once that's met, we can merge and iterate. The concern I voiced is also not specific to this PR, but general: In a world where more and more proofs are AI-generated, we should pause and reflect on what the value of the individual proof still is, in that case, and in particular, to what extent we should care about the specific shape of a proof. My feeling is that the same engineering principles that we tend to apply as humans will still prove useful in an AI-world: Clean structure, separation specifics from generalities, hoisting out the latter, etc. -- will all make subsequent work with the code-base easier, whether done by a human or by an AI. As indicated in the review, I think there is potential for improvement in the current proof.

Finally: @jakemas If I ask something in a PR of your's that's not done elsewhere either, please flag it. I want us to have a consistent discipline for all proofs, so if there's divergence, please flag it so we can decide whether to go with precedent, or whether to expand the scope of the change to other proofs (potentially deferred and tracked in an issue).

jakemas added 3 commits June 3, 2026 02:18
…ty proofs

Add functional correctness, subroutine correctness, memory-safety and
subroutine-safety proofs for the AArch64 assembly implementations of
rej_uniform_eta for both eta variants:
- rej_uniform_eta2 (eta=2, used in ML-DSA-44)
- rej_uniform_eta4 (eta=4, used in ML-DSA-65/87)

Memory safety follows the mlkem_rej_uniform_VARIABLE_TIME pattern in
s2n-bignum because the loop count is data-dependent (which nibbles
pass the < 9 / < 15 filter).

Written with the assistance of Claude Opus 4.7.

Signed-off-by: Jake Massimo <jakemas@amazon.com>
- REJ_SAMPLE_ETA{2,4} take a nibble list (plain filter+map); the proofs
  bridge to the byte-shape interior at the subroutine spec only.
- Add `outlen <= 256` clause to the subroutine post, matching CBMC.
- Hardcode the per-coefficient bound (3 for eta2, 5 for eta4) in the
  CBMC contracts; the asm is level-specific so the spec mustn't depend
  on MLDSA_ETA.
- Drop unused REJ_SAMPLE_ETA{2,4}_{EMPTY,APPEND}, REJ_NIBBLES_COUNT_8,
  and a few duplicate helpers from the eta2 file.

Written with the assistance of Claude Opus 4.7.

Signed-off-by: Jake Massimo <jakemas@amazon.com>
Signed-off-by: Jake Massimo <jakemas@amazon.com>
@jakemas jakemas force-pushed the add-hol-light-rej-uniform-eta4 branch from 53458cd to a9f533a Compare June 3, 2026 02:18
Hoist SUB_LIST_8_BYTES_FROM_INT64 (the 8-byte chunk -> int64 split, used 4
times) into aarch64_utils.ml. Compact WORD_SUBWORD_OF_JOIN_8X16 from 42 to
16 lines using a programmatic generator.

Signed-off-by: Jake Massimo <jakemas@amazon.com>
@jakemas
Copy link
Copy Markdown
Contributor Author

jakemas commented Jun 3, 2026

The spec is not yet in the right shape, see comment and precedent of rejection sampling. The CBMC specs should not use level-dependent macros.

Finally: I realize we're living in a world where agents write proofs and their readability is becoming less important, but the proofs strike me as extremely long and lacking any commentary helping a human understand what's happening. As the agent reworks the proofs, please nudge it to a) write as compact as possible, b) add some high-level comments on the high-level steps. In particular, the proof contains multiple steps which strike me as inlining lemmas that should either be hoisted out or already be present in general form, such as

SUBGOAL_THEN
       `SUB_LIST(8 * i, 8) (inlist:byte list) =
        [word_subword (loaded_d:int64) (0,8):byte;
         word_subword loaded_d (8,8);
         word_subword loaded_d (16,8);
         word_subword loaded_d (24,8);
         word_subword loaded_d (32,8);
         word_subword loaded_d (40,8);
         word_subword loaded_d (48,8);
         word_subword loaded_d (56,8)]`

I'd also be surprised if something like

let WORD_SUBWORD_OF_JOIN_8X16 = BITBLAST_RULE
 `word_subword
  (word_join
   (word_join (word_join (h7:int16) (h6:int16):int32) (word_join (h5:int16) (h4:int16):int32):int64)
   (word_join (word_join (h3:int16) (h2:int16):int32) (word_join (h1:int16) (h0:int16):int32):int64):int128)
   (0,16):int16 = h0 /\
  word_subword
  (word_join
   (word_join (word_join (h7:int16) (h6:int16):int32) (word_join (h5:int16) (h4:int16):int32):int64)
   (word_join (word_join (h3:int16) (h2:int16):int32) (word_join (h1:int16) (h0:int16):int32):int64):int128)
   (16,16):int16 = h1 /\
  word_subword
  (word_join
   (word_join (word_join (h7:int16) (h6:int16):int32) (word_join (h5:int16) (h4:int16):int32):int64)
   (word_join (word_join (h3:int16) (h2:int16):int32) (word_join (h1:int16) (h0:int16):int32):int64):int128)
   (32,16):int16 = h2 /\
  word_subword
  (word_join
   (word_join (word_join (h7:int16) (h6:int16):int32) (word_join (h5:int16) (h4:int16):int32):int64)
   (word_join (word_join (h3:int16) (h2:int16):int32) (word_join (h1:int16) (h0:int16):int32):int64):int128)
   (48,16):int16 = h3 /\
  word_subword
  (word_join
   (word_join (word_join (h7:int16) (h6:int16):int32) (word_join (h5:int16) (h4:int16):int32):int64)
   (word_join (word_join (h3:int16) (h2:int16):int32) (word_join (h1:int16) (h0:int16):int32):int64):int128)
   (64,16):int16 = h4 /\
  word_subword
  (word_join
   (word_join (word_join (h7:int16) (h6:int16):int32) (word_join (h5:int16) (h4:int16):int32):int64)
   (word_join (word_join (h3:int16) (h2:int16):int32) (word_join (h1:int16) (h0:int16):int32):int64):int128)
   (80,16):int16 = h5 /\
  word_subword
  (word_join
   (word_join (word_join (h7:int16) (h6:int16):int32) (word_join (h5:int16) (h4:int16):int32):int64)
   (word_join (word_join (h3:int16) (h2:int16):int32) (word_join (h1:int16) (h0:int16):int32):int64):int128)
   (96,16):int16 = h6 /\
  word_subword
  (word_join
   (word_join (word_join (h7:int16) (h6:int16):int32) (word_join (h5:int16) (h4:int16):int32):int64)
   (word_join (word_join (h3:int16) (h2:int16):int32) (word_join (h1:int16) (h0:int16):int32):int64):int128)
   (112,16):int16 = h7`;;

was really necessary -- please look into existing split/merge tactics.

This is not only useful for humans, but also for agents: They will have an easier time learning from and adjusting the proof if it is well-structured, and common material is hoisted out. And, like humans, they typically need prodding to turn some proof into a good proof. So let's not stop at the first proof we find but use the power of the agent to have it write a good proof before committing. Nudge the agent, but also use your own judgement and go over the proof with human eyes and see if you can follow the structure and whether there are dubious steps or steps that ought to be hoisted out into / reused from general infrastructure.

Ok addressing this with 2e48fa7

…twidth

The public spec now operates on a (4 word) list -- the natural bitwidth of
a nibble -- rather than int16 list. Callers lift their byte buffer into a
nibble list via BYTES_TO_NIBBLES, which lives in common/mldsa_specs.ml.
The byte->int16 nibble form (NIBBLES_OF_BYTES) is kept inside the proof
file as a private helper to bridge to the byte-shape interior; the bridge
goes through a one-line lemma at the subroutine spec.

Signed-off-by: Jake Massimo <jakemas@amazon.com>
Comment thread proofs/hol_light/common/mldsa_specs.ml Outdated
Comment on lines +2040 to +2054
(* Splits each input byte into its low and high 4-bit nibbles, expressed at *)
(* the natural bitwidth (4 word). The output is twice the length of the *)
(* input. Used by callers of REJ_SAMPLE_ETA{2,4} to lift a byte buffer into *)
(* a nibble list. *)
let BYTES_TO_NIBBLES = define
`BYTES_TO_NIBBLES [] = ([]:(4 word) list) /\
BYTES_TO_NIBBLES (CONS (b:byte) t) =
APPEND [word(val b MOD 16):4 word; word(val b DIV 16):4 word]
(BYTES_TO_NIBBLES t)`;;

let BYTES_TO_NIBBLES_APPEND = prove
(`!l1 l2. BYTES_TO_NIBBLES(APPEND l1 l2) =
APPEND (BYTES_TO_NIBBLES l1) (BYTES_TO_NIBBLES l2)`,
LIST_INDUCT_TAC THEN
ASM_REWRITE_TAC[BYTES_TO_NIBBLES; APPEND; APPEND_ASSOC]);;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed in mldsa_specs.ml anymore I believe and should go in the proof script.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — removed from mldsa_specs.ml.

Comment on lines +2057 to +2061
`REJ_SAMPLE_ETA2 (l:(4 word) list) =
MAP (\x:4 word.
word_sx(word_sub (word 2:int16)
(word_umod (word_zx x:int16) (word 5))):int32)
(FILTER (\x:4 word. val x < 15) l)`;;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has the right shape, but can be simpler: The MAP can do the 2 - (x umod 5) computation in 4 word, and ultimately word_sx from 4 word -> 32 word.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, for a specification it is most natural to work in the integers where possible, which would mean MAPing over 4 word -> int -> int -> 32 word, where 4 word -> int is unsigned val, int -> int is x |-> 2 - (x umod 5), and int -> 32 word is signed interpretation.

Could you also check if HOL-Light provides a MAP_FILTER primitive?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will apply to REJ_SAMPLE_ETA2 too — coming in a follow-up commit. (Tested the per-element bridge; word_umod doesn't BITBLAST cleanly so the eta2 bridge takes a 15-case enumeration on val x, but it goes through.)

On MAP_FILTER: HOL Light has FILTER_MAP : FILTER P (MAP f l) = MAP f (FILTER (P o f) l), but no fused MAP_FILTER combinator at the term level — only mapfilter as an OCaml-level helper. I don't think there's a cleaner spec form available here.

Comment thread proofs/hol_light/common/mldsa_specs.ml Outdated
Comment on lines +2065 to +2066
MAP (\x:4 word.
word_sx(word_sub (word 4:int16) (word_zx x:int16)):int32)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on what we do in REJ_SAMPLE_ETA2, this should be similar: Either work mostly in 4 word, or go through int; and, if present, use a MAP_FILTER primitive.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REJ_SAMPLE_ETA4 now does the (4 - x) computation in 4 word and word_sx's once to int32 at the end. REJ_SAMPLE_ETA2 will get the same treatment in a follow-up commit.

The MAP body now does the (4 - x) computation in 4 word and word_sx's once
to int32 at the end, removing the int16 detour. BYTES_TO_NIBBLES_APPEND
moved out of mldsa_specs.ml (was unused there).

Signed-off-by: Jake Massimo <jakemas@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HOL-Light: Prove AArch64 rej_uniform_eta

4 participants