perf(stark): eliminate redundant bit-reversal permutations in prover …#1187
perf(stark): eliminate redundant bit-reversal permutations in prover …#1187diegokingston wants to merge 2 commits intomainfrom
Conversation
…pipeline The NR DIT FFT naturally produces bit-reversed output, but the prover was round-tripping through natural order before bit-reversing again for Merkle tree commitment. This added ~2 unnecessary O(N) random-access permutations per trace column, composition poly part, and FRI layer. Add evaluate_offset_fft_bitrev to the polynomial API (skips the final bit-reverse permute in the FFT), and use it directly in: - Trace commitment (main + aux): bitrev FFT → simple transpose → Merkle - Composition poly commitment: bitrev FFT → pair → Merkle - FRI layer construction: bitrev FFT → pair → Merkle Natural-order evaluations are still computed separately for constraint evaluation (LDETraceTable) and opening queries, where sequential access matters for cache performance.
Kimi AI ReviewReview FeedbackFile:
|
| in_place_nr_2radix_fft(&mut results, twiddles); | ||
|
|
||
| Ok(results) | ||
| } |
There was a problem hiding this comment.
Correctness
- FFT Implementation: Ensure that
in_place_nr_2radix_fftmaintains the correctness properties of the FFT, especially the handling of edge cases like inputs of size one or power-of-two constraints. The call tois_power_of_two()should ensure this, but audit the helper function for correctness in recursion and base cases.
Security
- Timing Side-Channels: Verify that
in_place_nr_2radix_fftoperates in constant time. If the algorithm's operations depend on secret data, ensure it does not lead to timing side-channel vulnerabilities. - Sensitive Data Zeroization: After operations, if any secret data persists in
results, ensure it is zeroized to prevent unintended disclosure.
Performance
- Unnecessary Allocations: Converting input to vector with
input.to_vec()results in an allocation. Ensure this conversion is necessary, and consider using a different approach ifinputcan be operated on in-place. - Redundant Calculations: Confirm that
twiddlesare precomputed efficiently, and all inverse calculations (like inverses and field operations in the FFT) are optimized.
Bugs & Errors
- Off-by-One Errors: Since FFT involves array indexing, double-check accesses to
resultsandtwiddlesfor potential off-by-one errors. - Error Handling: While
FFTError::InputError(input.len())is returned for non-power-of-two sizes, ensure this is a comprehensive error management strategy for other potential issues within the FFT process.
Code Simplicity
- Abstraction Quality: The new function has specific use cases, i.e., creating Merkle trees. Ensure it leverages existing abstractions efficiently and doesn't introduce unnecessary complexity. Create clear documentation for the circumstances that
fft_bitrevshould be used overfft.
In summary, verify correctness in edge cases, constant-time operation of FFT logic, and efficient handling of inputs without unnecessary allocations. Enhanced documentation to aid understanding of when fft_bitrev should be preferred is also recommended before merging.
|
|
||
| pub fn interpolate_fft_cpu<F, E>( | ||
| fft_evals: &[FieldElement<E>], | ||
| ) -> Result<Polynomial<FieldElement<E>>, FFTError> |
There was a problem hiding this comment.
Correctness
- Domain Size Check: In
evaluate_fft_bitrev, the checkif len.trailing_zeros() as u64 > F::TWO_ADICITYshould probably use>=to ensure correctness when exactly equal to TWO_ADICITY. - Edge Case Handling: The function
evaluate_fft_bitrevhandles empty polynomials correctly by returning zeros. However, ensure this behavior is intended, specifically for downstream applications.
Security
- Timing Side-channels: Ensure that operations involved in FFT (such as permutations and bit-reversals) are constant-time, especially if working over secret data.
- Randomness: No new randomness operations are introduced in the changes, but ensure randomness used elsewhere in FFT or cryptographic commitments is secure.
Performance
- Redundant Operations: The function
evaluate_fft_bitrevduplicates some logic ofevaluate_fft. Consider DRY (Don't Repeat Yourself) principle by refactoring common logic into a helper function. - Unnecessary Allocations: The
coeffsvector is resized with zeros beforeevaluate_fft_bitrev_cpuis called, which might incur unnecessary allocation ifcoeffsis already of the needed length.
Bugs & Errors
- Handling Errors: In
evaluate_fft_bitrev_cpu, ensure that the construction oftwiddlescorrectly handles potential errors fromget_twiddles, which seems covered but should be verified.
Code Simplicity
- Complexity: Overall, the functions added are simple, though there is some duplication with existing functionality. Consider refactoring to improve maintainability.
In summary, correctness around domain size check and constant-time assurance for security are the primary concerns. Some refactoring could resolve performance and DRY issues. Address these concerns before considering merging.
| Polynomial::evaluate_offset_fft_bitrev(poly, 1, Some(domain_size), coset_offset)?; | ||
|
|
||
| let mut to_commit = Vec::new(); | ||
| for chunk in evaluation.chunks(2) { |
There was a problem hiding this comment.
Correctness
- The change of using
evaluate_offset_fft_bitrevdirectly instead of performing manual bit reversal after FFT suggests a more optimized path, avoiding an unnecessary round-trip. However, ensure thatevaluate_offset_fft_bitrevis thoroughly tested, producing the same results in different contexts, particularly with edge cases like the zero polynomial and others.
Security
- Ensure that operations over sensitive data, such as those involving secret keys or cryptographic randomness, are constant-time to prevent timing side-channel attacks. Double-check that no secret-dependent branching remains in these operations.
- Evaluate the use of cryptographically secure randomness where applicable, especially in zero-knowledge proof systems.
- If not already handled, sensitive data must be zeroized after use to prevent memory scraping attacks.
Performance
- The change improves performance slightly by avoiding redundant permutation steps in FFT processing. The consolidation into a single call (
evaluate_offset_fft_bitrev) is a beneficial optimization.
Bugs & Errors
- No new Rust language-specific errors related to
unwrapor potential panics introduced in this section of the code. Ensure that other parts using this code handle possible errors without panics.
Code Simplicity
- The reduction of unnecessary bit-reversing simplifies the code, making the implementation more streamlined and easier to maintain. No other issues with code complexity observed in the specific section provided.
Overall
- The changes here look mostly beneficial, but more comprehensive testing should confirm that similar optimizations apply across the board without edge case regressions. Ensure all mathematical operations remain valid, especially in zero-knowledge proof and polynomial operations.
| Self::commit_composition_polynomial_bitrev(&lde_composition_poly_parts_bitrev) | ||
| else { | ||
| return Err(ProvingError::EmptyCommitment); | ||
| }; |
There was a problem hiding this comment.
Correctness:
-
Polynomial Operations: Ensure that
evaluate_offset_fft_bitrevand related polynomial evaluation functions are correctly implemented and verified for correctness. -
Domain-Size Edge Cases: The method
evaluate_polynomial_on_lde_domain_bitrevuses a division bydomain_size * blowup_factor. Make sure that these values are never zero to avoid division by zero errors. -
Transposition Logic: In
commit_composition_polynomial_bitrev, ensure that transposition logic correctly handles cases for different domain sizes and polynomial lengths, confirming that no data is inadvertently lost or incorrectly rearranged. -
Handling of Bit-Reversed Data: The function
commit_composition_polynomial_bitrevassumes data is already in bit-reversed format. Double-check that all inputs meet this assumption to avoid logic errors in FFT operations.
Security:
-
Constant-Time Operation: Review all FFT and modular arithmetic operations for constant-time implementations to mitigate timing attacks, especially those near secret values.
-
Zeroization: Ensure that sensitive data, particularly the evaluations and results related to cryptographic operations, are zeroized after use.
-
Randomness: Verify that cryptographically secure randomness is used in all areas requiring randomness, such as field element generation.
Bugs & Errors:
-
Potential Panics: Ensure code involving indexing and slicing safely handles all possible ranges and does not lead to panics.
-
Error Handling: The new functions returning
Resulttypes should properly handle and propagate errors instead of potentially ignoring them. -
Integer Overflow/Underflow: Rust's built-in checks mitigate this risk, but manual verification can confirm there are no logical issues leading to unintended overflows.
Performance:
-
Redundant Computations: Check if there is any unnecessary duplication of polynomial evaluation where bit-reversing could be unified to reduce computational overhead (e.g., avoid double evaluations when mapping directly to desired order output).
-
Memory Usage: Look for optimization opportunities in
evaluate_polynomial_on_lde_domain_bitrevwhere vector manipulations and storage could be streamlined to reduce allocations.
Code Simplicity:
- Abstraction Consistency: The introduction of bit-reversed specific functions increases complexity slightly. Consider whether these can be integrated more harmoniously with existing functions to minimize the expansion of the API surface.
Overall, the proposed changes introduce modifications in FFT and polynomial evaluations while aiming to optimize bit-reversal requirements. These need careful verification in practice to ensure no edge cases or unintended errors arise during both typical and boundary-case usage.
Greptile SummaryOptimizes STARK prover pipeline by eliminating ~2 redundant O(N) bit-reversal permutations per trace column, composition polynomial part, and FRI layer. The NR DIT FFT naturally produces bit-reversed output, but the previous implementation round-tripped through natural order before bit-reversing again for Merkle tree commitment. Key changes:
The refactor maintains correctness by computing both evaluation orders where needed - natural order for Confidence Score: 5/5
|
| Filename | Overview |
|---|---|
| crates/math/src/fft/cpu/ops.rs | Added fft_bitrev function to skip final bit-reverse permutation, correctly reuses existing FFT implementation |
| crates/math/src/fft/polynomial.rs | Added evaluate_fft_bitrev and evaluate_offset_fft_bitrev API methods with proper error handling |
| crates/provers/stark/src/fri/mod.rs | Refactored FRI layer construction to use bitrev FFT directly, eliminating redundant permutation |
| crates/provers/stark/src/prover.rs | Dual-path evaluation strategy: natural-order for queries, bit-reversed for Merkle commitments, correctly implemented |
Last reviewed commit: 90166de
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1187 +/- ##
==========================================
+ Coverage 75.69% 75.73% +0.03%
==========================================
Files 205 205
Lines 46577 46667 +90
==========================================
+ Hits 35257 35341 +84
- Misses 11320 11326 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Kimi AI ReviewReview Feedback
|
…pipeline
The NR DIT FFT naturally produces bit-reversed output, but the prover was round-tripping through natural order before bit-reversing again for Merkle tree commitment. This added ~2 unnecessary O(N) random-access permutations per trace column, composition poly part, and FRI layer.
Add evaluate_offset_fft_bitrev to the polynomial API (skips the final bit-reverse permute in the FFT), and use it directly in:
Natural-order evaluations are still computed separately for constraint evaluation (LDETraceTable) and opening queries, where sequential access matters for cache performance.