Skip to content

perf(stark): eliminate redundant bit-reversal permutations in prover …#1187

Open
diegokingston wants to merge 2 commits intomainfrom
opt/eliminate-redundant-bit-reversals
Open

perf(stark): eliminate redundant bit-reversal permutations in prover …#1187
diegokingston wants to merge 2 commits intomainfrom
opt/eliminate-redundant-bit-reversals

Conversation

@diegokingston
Copy link
Collaborator

…pipeline

The NR DIT FFT naturally produces bit-reversed output, but the prover was round-tripping through natural order before bit-reversing again for Merkle tree commitment. This added ~2 unnecessary O(N) random-access permutations per trace column, composition poly part, and FRI layer.

Add evaluate_offset_fft_bitrev to the polynomial API (skips the final bit-reverse permute in the FFT), and use it directly in:

  • Trace commitment (main + aux): bitrev FFT → simple transpose → Merkle
  • Composition poly commitment: bitrev FFT → pair → Merkle
  • FRI layer construction: bitrev FFT → pair → Merkle

Natural-order evaluations are still computed separately for constraint evaluation (LDETraceTable) and opening queries, where sequential access matters for cache performance.

…pipeline

The NR DIT FFT naturally produces bit-reversed output, but the prover was
round-tripping through natural order before bit-reversing again for Merkle
tree commitment. This added ~2 unnecessary O(N) random-access permutations
per trace column, composition poly part, and FRI layer.

Add evaluate_offset_fft_bitrev to the polynomial API (skips the final
bit-reverse permute in the FFT), and use it directly in:
- Trace commitment (main + aux): bitrev FFT → simple transpose → Merkle
- Composition poly commitment: bitrev FFT → pair → Merkle
- FRI layer construction: bitrev FFT → pair → Merkle

Natural-order evaluations are still computed separately for constraint
evaluation (LDETraceTable) and opening queries, where sequential access
matters for cache performance.
@diegokingston diegokingston requested a review from a team as a code owner February 25, 2026 01:29
@github-actions
Copy link

Kimi AI Review

Review Feedback

File: crates/math/src/fft/cpu/ops.rs

Changes:

  • Added fft_bitrev function to compute FFT in bit-reversed order. (diff)

Feedback:

  • The new fft_bitrev function is a useful optimization for scenarios where the output of the FFT will be committed to a Merkle tree in bit-reversed order. This avoids an unnecessary permutation round-trip.
  • Ensure that the fft_bitrev function correctly handles edge cases, such as when the input length is not a power of two. The error handling seems appropriate.
  • The implementation should be tested to ensure that it produces the same results as the standard fft function but in bit-reversed order.

File: crates/math/src/fft/polynomial.rs

Changes:

  • Added evaluate_fft_bitrev and evaluate_offset_fft_bitrev methods to Polynomial. (diff)

Feedback:

  • These methods are necessary to support bit-reversed FFT evaluations at the polynomial level, which is useful for Merkle tree commitments.
  • The error handling in evaluate_fft_bitrev seems appropriate, especially checking for domain size errors.
  • The implementation should be tested to ensure consistency with the standard evaluate_fft methods but producing bit-reversed results.

File: crates/provers/stark/src/fri/mod.rs

Changes:

  • Replaced in_place_bit_reverse_permute with direct use of evaluate_offset_fft_bitrev. (diff)

Feedback:

  • This change is correct and optimizes the code by avoiding unnecessary permutations. Ensure that the bit-reversed evaluations are correctly used in the subsequent steps.

File: crates/provers/stark/src/prover.rs

Changes:

  • Added evaluate_polynomial_on_lde_domain_bitrev and compute_lde_trace_evaluations_bitrev methods. (diff)

Feedback:

  • These methods are crucial for optimizing the prover's performance by avoiding unnecessary permutations when committing to Merkle trees.
  • The compute_lde_trace_evaluations_bitrev method should be thoroughly tested to ensure it produces the expected bit-reversed evaluations.
  • The change to use columns2rows instead of columns2rows_bit_reversed is correct and should be verified through testing.

Overall, the changes aim to optimize the performance of the Stark prover by reducing unnecessary permutations when committing to Merkle trees. The changes are well-justified and should be thoroughly tested to ensure correctness and performance improvements.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review by ChatGPT

in_place_nr_2radix_fft(&mut results, twiddles);

Ok(results)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness

  • FFT Implementation: Ensure that in_place_nr_2radix_fft maintains the correctness properties of the FFT, especially the handling of edge cases like inputs of size one or power-of-two constraints. The call to is_power_of_two() should ensure this, but audit the helper function for correctness in recursion and base cases.

Security

  • Timing Side-Channels: Verify that in_place_nr_2radix_fft operates in constant time. If the algorithm's operations depend on secret data, ensure it does not lead to timing side-channel vulnerabilities.
  • Sensitive Data Zeroization: After operations, if any secret data persists in results, ensure it is zeroized to prevent unintended disclosure.

Performance

  • Unnecessary Allocations: Converting input to vector with input.to_vec() results in an allocation. Ensure this conversion is necessary, and consider using a different approach if input can be operated on in-place.
  • Redundant Calculations: Confirm that twiddles are precomputed efficiently, and all inverse calculations (like inverses and field operations in the FFT) are optimized.

Bugs & Errors

  • Off-by-One Errors: Since FFT involves array indexing, double-check accesses to results and twiddles for potential off-by-one errors.
  • Error Handling: While FFTError::InputError(input.len()) is returned for non-power-of-two sizes, ensure this is a comprehensive error management strategy for other potential issues within the FFT process.

Code Simplicity

  • Abstraction Quality: The new function has specific use cases, i.e., creating Merkle trees. Ensure it leverages existing abstractions efficiently and doesn't introduce unnecessary complexity. Create clear documentation for the circumstances that fft_bitrev should be used over fft.

In summary, verify correctness in edge cases, constant-time operation of FFT logic, and efficient handling of inputs without unnecessary allocations. Enhanced documentation to aid understanding of when fft_bitrev should be preferred is also recommended before merging.


pub fn interpolate_fft_cpu<F, E>(
fft_evals: &[FieldElement<E>],
) -> Result<Polynomial<FieldElement<E>>, FFTError>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness

  • Domain Size Check: In evaluate_fft_bitrev, the check if len.trailing_zeros() as u64 > F::TWO_ADICITY should probably use >= to ensure correctness when exactly equal to TWO_ADICITY.
  • Edge Case Handling: The function evaluate_fft_bitrev handles empty polynomials correctly by returning zeros. However, ensure this behavior is intended, specifically for downstream applications.

Security

  • Timing Side-channels: Ensure that operations involved in FFT (such as permutations and bit-reversals) are constant-time, especially if working over secret data.
  • Randomness: No new randomness operations are introduced in the changes, but ensure randomness used elsewhere in FFT or cryptographic commitments is secure.

Performance

  • Redundant Operations: The function evaluate_fft_bitrev duplicates some logic of evaluate_fft. Consider DRY (Don't Repeat Yourself) principle by refactoring common logic into a helper function.
  • Unnecessary Allocations: The coeffs vector is resized with zeros before evaluate_fft_bitrev_cpu is called, which might incur unnecessary allocation if coeffs is already of the needed length.

Bugs & Errors

  • Handling Errors: In evaluate_fft_bitrev_cpu, ensure that the construction of twiddles correctly handles potential errors from get_twiddles, which seems covered but should be verified.

Code Simplicity

  • Complexity: Overall, the functions added are simple, though there is some duplication with existing functionality. Consider refactoring to improve maintainability.

In summary, correctness around domain size check and constant-time assurance for security are the primary concerns. Some refactoring could resolve performance and DRY issues. Address these concerns before considering merging.

Polynomial::evaluate_offset_fft_bitrev(poly, 1, Some(domain_size), coset_offset)?;

let mut to_commit = Vec::new();
for chunk in evaluation.chunks(2) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness

  • The change of using evaluate_offset_fft_bitrev directly instead of performing manual bit reversal after FFT suggests a more optimized path, avoiding an unnecessary round-trip. However, ensure that evaluate_offset_fft_bitrev is thoroughly tested, producing the same results in different contexts, particularly with edge cases like the zero polynomial and others.

Security

  • Ensure that operations over sensitive data, such as those involving secret keys or cryptographic randomness, are constant-time to prevent timing side-channel attacks. Double-check that no secret-dependent branching remains in these operations.
  • Evaluate the use of cryptographically secure randomness where applicable, especially in zero-knowledge proof systems.
  • If not already handled, sensitive data must be zeroized after use to prevent memory scraping attacks.

Performance

  • The change improves performance slightly by avoiding redundant permutation steps in FFT processing. The consolidation into a single call (evaluate_offset_fft_bitrev) is a beneficial optimization.

Bugs & Errors

  • No new Rust language-specific errors related to unwrap or potential panics introduced in this section of the code. Ensure that other parts using this code handle possible errors without panics.

Code Simplicity

  • The reduction of unnecessary bit-reversing simplifies the code, making the implementation more streamlined and easier to maintain. No other issues with code complexity observed in the specific section provided.

Overall

  • The changes here look mostly beneficial, but more comprehensive testing should confirm that similar optimizations apply across the board without edge case regressions. Ensure all mathematical operations remain valid, especially in zero-knowledge proof and polynomial operations.

Self::commit_composition_polynomial_bitrev(&lde_composition_poly_parts_bitrev)
else {
return Err(ProvingError::EmptyCommitment);
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness:

  1. Polynomial Operations: Ensure that evaluate_offset_fft_bitrev and related polynomial evaluation functions are correctly implemented and verified for correctness.

  2. Domain-Size Edge Cases: The method evaluate_polynomial_on_lde_domain_bitrev uses a division by domain_size * blowup_factor. Make sure that these values are never zero to avoid division by zero errors.

  3. Transposition Logic: In commit_composition_polynomial_bitrev, ensure that transposition logic correctly handles cases for different domain sizes and polynomial lengths, confirming that no data is inadvertently lost or incorrectly rearranged.

  4. Handling of Bit-Reversed Data: The function commit_composition_polynomial_bitrev assumes data is already in bit-reversed format. Double-check that all inputs meet this assumption to avoid logic errors in FFT operations.

Security:

  1. Constant-Time Operation: Review all FFT and modular arithmetic operations for constant-time implementations to mitigate timing attacks, especially those near secret values.

  2. Zeroization: Ensure that sensitive data, particularly the evaluations and results related to cryptographic operations, are zeroized after use.

  3. Randomness: Verify that cryptographically secure randomness is used in all areas requiring randomness, such as field element generation.

Bugs & Errors:

  1. Potential Panics: Ensure code involving indexing and slicing safely handles all possible ranges and does not lead to panics.

  2. Error Handling: The new functions returning Result types should properly handle and propagate errors instead of potentially ignoring them.

  3. Integer Overflow/Underflow: Rust's built-in checks mitigate this risk, but manual verification can confirm there are no logical issues leading to unintended overflows.

Performance:

  1. Redundant Computations: Check if there is any unnecessary duplication of polynomial evaluation where bit-reversing could be unified to reduce computational overhead (e.g., avoid double evaluations when mapping directly to desired order output).

  2. Memory Usage: Look for optimization opportunities in evaluate_polynomial_on_lde_domain_bitrev where vector manipulations and storage could be streamlined to reduce allocations.

Code Simplicity:

  1. Abstraction Consistency: The introduction of bit-reversed specific functions increases complexity slightly. Consider whether these can be integrated more harmoniously with existing functions to minimize the expansion of the API surface.

Overall, the proposed changes introduce modifications in FFT and polynomial evaluations while aiming to optimize bit-reversal requirements. These need careful verification in practice to ensure no edge cases or unintended errors arise during both typical and boundary-case usage.

@greptile-apps
Copy link

greptile-apps bot commented Feb 25, 2026

Greptile Summary

Optimizes STARK prover pipeline by eliminating ~2 redundant O(N) bit-reversal permutations per trace column, composition polynomial part, and FRI layer. The NR DIT FFT naturally produces bit-reversed output, but the previous implementation round-tripped through natural order before bit-reversing again for Merkle tree commitment.

Key changes:

  • Added fft_bitrev, evaluate_fft_bitrev, and evaluate_offset_fft_bitrev APIs to skip final bit-reverse permutation
  • Implemented dual-path evaluation: natural-order for constraint evaluation and opening queries (where sequential access matters for cache performance), bit-reversed for Merkle commitments (avoiding permutation overhead)
  • Updated trace commitment (main + aux), composition polynomial commitment, and FRI layer construction to use bitrev FFT directly
  • Replaced columns2rows_bit_reversed with columns2rows on already-bitrev data, eliminating the fused bit-reverse operation

The refactor maintains correctness by computing both evaluation orders where needed - natural order for reverse_index-based queries and bit-reversed for tree construction.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk - well-structured performance optimization with clear correctness preservation
  • The PR correctly implements a performance optimization by avoiding redundant permutations. The dual-path strategy (natural-order for queries, bit-reversed for commitments) is properly implemented across all use cases. The changes are minimal, focused, and maintain the same API contracts. The new bitrev functions simply skip the final permutation step that was immediately reversed anyway, and the prover correctly computes both evaluation orders where needed.
  • No files require special attention

Important Files Changed

Filename Overview
crates/math/src/fft/cpu/ops.rs Added fft_bitrev function to skip final bit-reverse permutation, correctly reuses existing FFT implementation
crates/math/src/fft/polynomial.rs Added evaluate_fft_bitrev and evaluate_offset_fft_bitrev API methods with proper error handling
crates/provers/stark/src/fri/mod.rs Refactored FRI layer construction to use bitrev FFT directly, eliminating redundant permutation
crates/provers/stark/src/prover.rs Dual-path evaluation strategy: natural-order for queries, bit-reversed for Merkle commitments, correctly implemented

Last reviewed commit: 90166de

@codecov-commenter
Copy link

codecov-commenter commented Feb 25, 2026

Codecov Report

❌ Patch coverage is 97.19626% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.73%. Comparing base (f33e79d) to head (c3c8e7d).

Files with missing lines Patch % Lines
crates/math/src/fft/cpu/ops.rs 90.90% 1 Missing ⚠️
crates/math/src/fft/polynomial.rs 97.22% 1 Missing ⚠️
crates/provers/stark/src/prover.rs 98.27% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1187      +/-   ##
==========================================
+ Coverage   75.69%   75.73%   +0.03%     
==========================================
  Files         205      205              
  Lines       46577    46667      +90     
==========================================
+ Hits        35257    35341      +84     
- Misses      11320    11326       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

Kimi AI Review

Review Feedback

crates/math/src/fft/cpu/ops.rs

File: crates/math/src/fft/cpu/ops.rs

  1. New Function fft_bitrev:
    • Line 24-34: The new function fft_bitrev is introduced to perform FFT and return results in bit-reversed order. This is a useful optimization for Merkle tree commitments.
    • Line 27: The check for input.len().is_power_of_two() is correct and necessary for FFT operations.
    • Line 29: The cloning of input to results is necessary for in-place operations but could be optimized to avoid the clone if the input is not modified elsewhere.

Suggestions:

  • Consider optimizing memory usage by avoiding the clone in fft_bitrev if possible.

crates/math/src/fft/polynomial.rs

File: crates/math/src/fft/polynomial.rs

  1. New Functions evaluate_fft_bitrev and evaluate_offset_fft_bitrev:
    • Line 161-190: These functions are introduced to evaluate polynomials using FFT and return results in bit-reversed order.
    • Line 165: The check for len.trailing_zeros() is correct to ensure the domain size is valid for the FFT operation.
    • Line 169: The resize operation to len is necessary to zero-pad the coefficients, which is correct.

Suggestions:

  • Ensure that the fft_bitrev_cpu function in ops is correctly implemented to handle the bit-reversed order, as it's called from these functions.

crates/provers/stark/src/fri/mod.rs

File: crates/provers/stark/src/fri/mod.rs

  1. Use of evaluate_offset_fft_bitrev:
    • Line 136-148: The use of evaluate_offset_fft_bitrev is correct to avoid redundant bit-reversing operations.
    • Line 141: The error handling for domain size is appropriate.

Suggestions:

  • No specific issues identified.

crates/provers/stark/src/prover.rs

File: crates/provers/stark/src/prover.rs

  1. Bit-Reversed Order Evaluations:
    • Line 212-301: The changes to use bit-reversed order evaluations for Merkle tree commitments are correct and improve efficiency.
    • Line 278-300: The new evaluate_polynomial_on_lde_domain_bitrev function is correctly used to avoid redundant bit-reversing operations.

Suggestions:

  • Ensure that all dependencies and calls to FFT-related functions are updated to handle bit-reversed order correctly.
  • Review the entire codebase for any other instances where bit-reversed order can be utilized to optimize performance.

Overall:
The changes are well-implemented and improve the efficiency of the library by avoiding redundant operations. The introduction of bit-reversed order evaluations is a significant optimization for Merkle tree commitments. Ensure thorough testing to verify the correctness of these changes, especially in the context of the entire proving process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants