Skip to content

chore: improve nullify in registry#2345

Open
SwenSchaeferjohann wants to merge 14 commits intomainfrom
swen/opt-nullify
Open

chore: improve nullify in registry#2345
SwenSchaeferjohann wants to merge 14 commits intomainfrom
swen/opt-nullify

Conversation

@SwenSchaeferjohann
Copy link
Contributor

@SwenSchaeferjohann SwenSchaeferjohann commented Mar 15, 2026

new registry program instruction nullify2

  • moves proof nodes from ix data to remaining accounts
  • this enables the following:

if client sends 2 instructions in one txn, Solana will deduplicate any overlapping proof nodes between those two instructions, reducing overall txn size.

pr also adds pairing logic inside forester client for v1 state nullification

Summary by CodeRabbit

Release Notes

  • New Features

    • Added optimized state nullification instruction variant with enhanced proof handling.
    • Implemented instruction pairing and batching optimization to improve transaction efficiency.
  • Documentation

    • Added comprehensive Forester V1 operational flow documentation covering transaction processing, state nullification, and end-to-end workflows.
  • Chores

    • Added new dependencies for binary encoding and graph matching algorithms.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 15, 2026

📝 Walkthrough

Walkthrough

This PR implements a v2 nullify instruction format with proof node pairing optimization. It adds maximum cardinality matching for batching state nullify instructions, introduces a new nullify_2 program instruction that passes proofs via remaining accounts, refactors proof handling through new instruction wrapper types, and adjusts cache timeout and batching strategy logic in the transaction builder.

Changes

Cohort / File(s) Summary
Dependencies
forester/Cargo.toml
Added bincode (1.3) and mwmatching (0.1.1) dependencies to support proof serialization and graph-based matching algorithms.
Documentation
forester/docs/v1_forester_flows.md
New comprehensive documentation describing Forester V1 end-to-end flows, including transaction send, state nullify instruction variants (legacy vs v2), pairing strategy, and integrated state tree flow.
Instruction Preparation & Wrapping
forester/src/processor/v1/helpers.rs
Introduced PreparedV1Instruction enum and StateNullifyInstruction struct to wrap instructions with metadata. Updated fetch_proofs_and_create_instructions to return wrapped instructions instead of raw instructions. Migrated from create_nullify_instruction to create_nullify_2_instruction with field renaming and fixed-size proof array conversion.
Transaction Building & Pairing
forester/src/processor/v1/tx_builder.rs
Added pairing-enabled batching path with should_attempt_pairing precheck, build_instruction_batches orchestration, and pair_state_nullify_batches using maximum cardinality matching to optimize instruction pairing. Adjusted per-item cache timeout (30s → 15s processing, 30s post-success refresh). Enhanced error handling for "not found" responses from fetch_proofs_and_create_instructions.
Program Validation & Helpers
programs/registry/src/account_compression_cpi/nullify.rs
Added extract_proof_nodes_from_remaining_accounts and validate_nullify_2_inputs helper functions with unit tests. Validates that nullify_2 inputs contain exactly one element per index slice and proof accounts count equals 16.
V2 Instruction Factory
programs/registry/src/account_compression_cpi/sdk.rs
Introduced CreateNullify2InstructionInputs struct and create_nullify_2_instruction factory function to construct Nullify2 CPI instructions with proof nodes passed as read-only accounts. Added unit tests validating discriminator, account count, and data layout.
Error Definitions
programs/registry/src/errors.rs
Added three new error variants: EmptyProofAccounts, InvalidProofAccountsLength, and InvalidNullify2Inputs for nullify_2 validation failures.
Program Instruction
programs/registry/src/lib.rs
New nullify_2 instruction handler that validates inputs, extracts proof nodes from remaining accounts, and delegates to shared process_nullify logic with identical access control.

Sequence Diagram

sequenceDiagram
    participant Forester as Forester V1<br/>Processor
    participant Builder as Transaction<br/>Builder
    participant Matcher as Pairing<br/>Matcher
    participant Registry as Registry<br/>Program

    Forester->>Builder: fetch_proofs_and_create_instructions()<br/>(prepared instructions)
    Builder->>Builder: count state_nullify instances
    Builder->>Builder: should_attempt_pairing()<br/>(check block height, RPC limits)
    
    alt Pairing Allowed
        Builder->>Matcher: pair_state_nullify_batches()<br/>(maximum cardinality matching)
        Matcher->>Matcher: build pairing graph<br/>(transaction size feasibility)
        Matcher->>Matcher: compute max matching
        Matcher->>Builder: paired + unpaired batches
        Builder->>Builder: build_instruction_batches()<br/>(AddressUpdate + paired StateNullify)
    else Pairing Disabled
        Builder->>Builder: build_instruction_batches()<br/>(flat instruction sequence)
    end
    
    Builder->>Registry: create transactions<br/>(nullify_2 instructions)
    Registry->>Registry: validate_nullify_2_inputs()
    Registry->>Registry: extract_proof_nodes_from_remaining_accounts()
    Registry->>Registry: process_nullify()
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

The heterogeneous nature of changes across processor logic, transaction batching, program-side validation, and instruction factory patterns—combined with the introduction of graph-based matching algorithms and wrapper type refactoring—requires careful reasoning about integration points, proof handling correctness, and matching algorithm soundness.

Possibly related PRs

Suggested labels

ai-review

Suggested reviewers

  • ananas-block
  • sergeytimoshin

Poem

🎯 Proofs pair up with matching grace,
Remaining accounts find their place,
From nullify to nullify_2 we flow,
Maximum cardinality steals the show! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title 'chore: improve nullify in registry' is vague and does not clearly convey the main change. It fails to mention the key technical innovation: introducing nullify_2 instruction that moves proof nodes to remaining accounts to enable transaction size optimization through proof deduplication and pairing logic. Revise the title to be more specific and descriptive of the primary change, such as: 'feat: introduce nullify_2 instruction with proof account deduplication' or 'feat: add nullify_2 with pairing optimization for proof node deduplication'.
Docstring Coverage ⚠️ Warning Docstring coverage is 48.39% which is insufficient. The required threshold is 70.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch swen/opt-nullify
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can disable poems in the walkthrough.

Disable the reviews.poem setting to disable the poems in the walkthrough.

@SwenSchaeferjohann
Copy link
Contributor Author

Mechanism:

Each nullify has 16 proof accounts (pubkeys).
Solana deduplicates account keys in a transaction.
If two nullifies share proof nodes, the combined tx is smaller.
Pairing is only attempted when the combined tx fits under 1232 bytes.

Algorithm:

Build a graph of all state nullifies.
For each pair (i, j): if a 2-instruction tx fits, add edge with weight 10_000 + overlap (overlap = shared proof nodes).
Run maximum cardinality matching to pair as many instructions as possible, preferring higher overlap.
Paired instructions go into one tx; unpaired go into single-instruction txs.
Batches are ordered by leaf index for deterministic behavior.

Guards:

Pairing only if batch_size >= 2.
Pairing only if enough blocks left before blockhash expiry (>25).
Pairing only if nullify count ≤ 96 and pair candidates ≤ 2000 (to cap matching cost).

@SwenSchaeferjohann SwenSchaeferjohann marked this pull request as ready for review March 17, 2026 12:54
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
forester/src/processor/v1/helpers.rs (1)

202-205: ⚠️ Potential issue | 🟡 Minor

Retry count in failure text is off by one.

The loop executes MAX_RETRIES + 1 attempts (0..=MAX), but the emitted error strings report only MAX_RETRIES.

🧩 Suggested fix
-                    "Failed to get address proofs for batch {} after {} retries: {}",
+                    "Failed to get address proofs for batch {} after {} attempts: {}",
                     batch_idx,
-                    ADDRESS_PROOF_MAX_RETRIES,
+                    ADDRESS_PROOF_MAX_RETRIES + 1,
                     e
                 ));
...
-                "Failed to get state proofs after {} retries: {}",
-                ADDRESS_PROOF_MAX_RETRIES,
+                "Failed to get state proofs after {} attempts: {}",
+                ADDRESS_PROOF_MAX_RETRIES + 1,
                 e
             ));

Also applies to: 332-334

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/processor/v1/helpers.rs` around lines 202 - 205, The failure log
messages that print the retry count use ADDRESS_PROOF_MAX_RETRIES but the loop
does 0..=ADDRESS_PROOF_MAX_RETRIES (i.e., MAX+1 attempts), so update the logged
count to reflect the actual number of attempts (e.g., use
ADDRESS_PROOF_MAX_RETRIES + 1 or log an explicit attempts variable). Locate the
log call that contains the string "Failed to get address proofs for batch {}
after {} retries: {}" and replace the second argument with
ADDRESS_PROOF_MAX_RETRIES + 1 (or the attempts counter), and make the same
change for the other occurrence around the "332-334" area so both messages
report the correct attempt count.
forester/src/processor/v1/tx_builder.rs (1)

191-193: ⚠️ Potential issue | 🟠 Major

Avoid swallowing generic “not found” errors.

contains("not found") is too broad and can mask unrelated upstream failures. Restrict this path to explicit missing-proof/indexer signals only.

🔧 Suggested fix
-                return if err_str.to_lowercase().contains("record not found")
-                    || err_str.to_lowercase().contains("not found")
+                let lower = err_str.to_lowercase();
+                return if lower.contains("record not found")
                 {
                     warn!("Record not found in indexer, skipping batch: {}", e);
                     // Return empty transactions but don't propagate the error
                     Ok((vec![], last_valid_block_height))
                 } else {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/processor/v1/tx_builder.rs` around lines 191 - 193, The current
check uses err_str.contains("not found"), which is too broad; update the error
handling in tx_builder (the block using err_str) to match only explicit
missing-proof/indexer signals instead of generic substrings: remove the generic
contains("not found") clause and either match on concrete error variants/enums
returned by your indexer/proof APIs (e.g., ProofNotFound,
IndexerError::NotFound) or check for specific, exact error strings like "record
not found" or the exact message your proof/indexer layers emit; ensure you use
pattern matching on the error type where possible (or exact equals on err_str)
so unrelated upstream "not found" messages aren’t swallowed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@forester/docs/v1_forester_flows.md`:
- Line 5: Several fenced code blocks in v1_forester_flows.md are missing
language identifiers (plain triple-backtick blocks at the shown diffs and also
at the other locations noted), which triggers markdownlint MD040; update each
triple-backtick block so it starts with a language tag (e.g., ```text or ```yaml
or ```json as appropriate) instead of bare ```, keeping the existing block
content unchanged, specifically adjust the opening fences for the blocks shown
in the diff and the additional occurrences around the other noted locations
(lines referenced in the review) so every code fence includes a language
identifier.
- Line 67: The docs reference the wrong SDK function name: replace
create_nullify_with_proof_accounts_instruction() with the actual v2 function
name create_nullify_2_instruction in the flow and ensure any mentioned
parameters/returns match the signature of create_nullify_2_instruction;
cross-check the helper implementation (create_nullify_2_instruction) and update
the documentation line so it accurately reflects the real function name and its
parameter/return names.
- Around line 105-106: The docs values for pair_candidates and
state_nullify_count are stale; update the documentation to match the actual
enforced constants in code by replacing the documented limits with
MAX_PAIR_CANDIDATES = 4_950 and MAX_PAIRING_INSTRUCTIONS = 100 (the constants
named MAX_PAIR_CANDIDATES and MAX_PAIRING_INSTRUCTIONS in tx_builder.rs), or
alternatively change those constants if you intend the docs to be the source of
truth—ensure the numbers in the docs and the values of the constants stay
identical.

In `@forester/src/processor/v1/tx_builder.rs`:
- Around line 385-388: The current logic sorts paired_batches and single_batches
separately then extends paired_batches with single_batches, which can produce a
non-globally-sorted result; change the code so that after combining the two
lists (e.g. after paired_batches.extend(single_batches)) you perform a single
global sort_by_key on paired_batches by the leaf key (the same key used in
paired_batches.sort_by_key and single_batches.sort_by_key), then map/collect the
batches as before (the paired_batches.into_iter().map(|(_, batch)|
batch).collect()) to guarantee deterministic global leaf ordering.

---

Outside diff comments:
In `@forester/src/processor/v1/helpers.rs`:
- Around line 202-205: The failure log messages that print the retry count use
ADDRESS_PROOF_MAX_RETRIES but the loop does 0..=ADDRESS_PROOF_MAX_RETRIES (i.e.,
MAX+1 attempts), so update the logged count to reflect the actual number of
attempts (e.g., use ADDRESS_PROOF_MAX_RETRIES + 1 or log an explicit attempts
variable). Locate the log call that contains the string "Failed to get address
proofs for batch {} after {} retries: {}" and replace the second argument with
ADDRESS_PROOF_MAX_RETRIES + 1 (or the attempts counter), and make the same
change for the other occurrence around the "332-334" area so both messages
report the correct attempt count.

In `@forester/src/processor/v1/tx_builder.rs`:
- Around line 191-193: The current check uses err_str.contains("not found"),
which is too broad; update the error handling in tx_builder (the block using
err_str) to match only explicit missing-proof/indexer signals instead of generic
substrings: remove the generic contains("not found") clause and either match on
concrete error variants/enums returned by your indexer/proof APIs (e.g.,
ProofNotFound, IndexerError::NotFound) or check for specific, exact error
strings like "record not found" or the exact message your proof/indexer layers
emit; ensure you use pattern matching on the error type where possible (or exact
equals on err_str) so unrelated upstream "not found" messages aren’t swallowed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3d8517da-43e7-4704-8b60-21dd1a233c65

📥 Commits

Reviewing files that changed from the base of the PR and between 2644529 and d07b51a.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock and included by none
  • program-tests/registry-test/tests/nullify_2_regression.rs is excluded by none and included by none
📒 Files selected for processing (8)
  • forester/Cargo.toml
  • forester/docs/v1_forester_flows.md
  • forester/src/processor/v1/helpers.rs
  • forester/src/processor/v1/tx_builder.rs
  • programs/registry/src/account_compression_cpi/nullify.rs
  • programs/registry/src/account_compression_cpi/sdk.rs
  • programs/registry/src/errors.rs
  • programs/registry/src/lib.rs


## 1. Transaction Send Flow (Blockhash)

```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language identifiers to fenced code blocks.

These blocks trigger markdownlint MD040.

🛠 Suggested patch
-```
+```text
 ...
-```
+```


-```
+```text
 ...
-```
+```


-```
+```text
 ...
-```
+```


-```
+```text
 ...
-```
+```

Also applies to: 59-59, 87-87, 131-131

🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 5-5: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/docs/v1_forester_flows.md` at line 5, Several fenced code blocks in
v1_forester_flows.md are missing language identifiers (plain triple-backtick
blocks at the shown diffs and also at the other locations noted), which triggers
markdownlint MD040; update each triple-backtick block so it starts with a
language tag (e.g., ```text or ```yaml or ```json as appropriate) instead of
bare ```, keeping the existing block content unchanged, specifically adjust the
opening fences for the blocks shown in the diff and the additional occurrences
around the other noted locations (lines referenced in the review) so every code
fence includes a language identifier.

LEGACY (proof in ix data) v2 (proof in remaining_accounts)
─────────────────────── ────────────────────────────────────

create_nullify_instruction() create_nullify_with_proof_accounts_instruction()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the actual v2 SDK function name in this flow.

This line references create_nullify_with_proof_accounts_instruction(), but the implementation path uses create_nullify_2_instruction (see forester/src/processor/v1/helpers.rs, Line 391).

As per coding guidelines "Cross-reference any mentioned function names, parameters, return types, and error conditions with the source code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/docs/v1_forester_flows.md` at line 67, The docs reference the wrong
SDK function name: replace create_nullify_with_proof_accounts_instruction() with
the actual v2 function name create_nullify_2_instruction in the flow and ensure
any mentioned parameters/returns match the signature of
create_nullify_2_instruction; cross-check the helper implementation
(create_nullify_2_instruction) and update the documentation line so it
accurately reflects the real function name and its parameter/return names.

Comment on lines +105 to +106
│ - pair_candidates = n*(n-1)/2 <= 2000 (MAX_PAIR_CANDIDATES)
│ - state_nullify_count <= 96 (MAX_PAIRING_INSTRUCTIONS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Pairing limits in docs are out of sync with code.

The docs state <= 2000 candidates and <= 96 instructions, while forester/src/processor/v1/tx_builder.rs currently enforces MAX_PAIR_CANDIDATES = 4_950 and MAX_PAIRING_INSTRUCTIONS = 100.

As per coding guidelines "Ensure documentation describes the actual behavior, not outdated or planned behavior".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/docs/v1_forester_flows.md` around lines 105 - 106, The docs values
for pair_candidates and state_nullify_count are stale; update the documentation
to match the actual enforced constants in code by replacing the documented
limits with MAX_PAIR_CANDIDATES = 4_950 and MAX_PAIRING_INSTRUCTIONS = 100 (the
constants named MAX_PAIR_CANDIDATES and MAX_PAIRING_INSTRUCTIONS in
tx_builder.rs), or alternatively change those constants if you intend the docs
to be the source of truth—ensure the numbers in the docs and the values of the
constants stay identical.

Comment on lines +385 to +388
paired_batches.sort_by_key(|(leaf, _)| *leaf);
single_batches.sort_by_key(|(leaf, _)| *leaf);
paired_batches.extend(single_batches);
Ok(paired_batches.into_iter().map(|(_, batch)| batch).collect())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Final batch order is not globally leaf-sorted.

You sort paired and single batches separately, then append singles; this can break deterministic global order by leaf index.

🔁 Suggested fix
-    paired_batches.sort_by_key(|(leaf, _)| *leaf);
-    single_batches.sort_by_key(|(leaf, _)| *leaf);
-    paired_batches.extend(single_batches);
-    Ok(paired_batches.into_iter().map(|(_, batch)| batch).collect())
+    let mut all_batches = Vec::with_capacity(paired_batches.len() + single_batches.len());
+    all_batches.extend(paired_batches);
+    all_batches.extend(single_batches);
+    all_batches.sort_by_key(|(leaf, _)| *leaf);
+    Ok(all_batches.into_iter().map(|(_, batch)| batch).collect())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/processor/v1/tx_builder.rs` around lines 385 - 388, The current
logic sorts paired_batches and single_batches separately then extends
paired_batches with single_batches, which can produce a non-globally-sorted
result; change the code so that after combining the two lists (e.g. after
paired_batches.extend(single_batches)) you perform a single global sort_by_key
on paired_batches by the leaf key (the same key used in
paired_batches.sort_by_key and single_batches.sort_by_key), then map/collect the
batches as before (the paired_batches.into_iter().map(|(_, batch)|
batch).collect()) to guarantee deterministic global leaf ordering.

pub fn nullify_2<'info>(
ctx: Context<'_, '_, '_, 'info, NullifyLeaves<'info>>,
bump: u8,
change_log_indices: Vec<u64>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
change_log_indices: Vec<u64>,
first_change_log_index: u16,

the second one should be first_change_log_index + 1 % changelog_length

ctx: Context<'_, '_, '_, 'info, NullifyLeaves<'info>>,
bump: u8,
change_log_indices: Vec<u64>,
leaves_queue_indices: Vec<u16>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
leaves_queue_indices: Vec<u16>,
leaves_queue_indices: [u16;2],

bump: u8,
change_log_indices: Vec<u64>,
leaves_queue_indices: Vec<u16>,
indices: Vec<u64>,
Copy link
Contributor

@ananas-block ananas-block Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
indices: Vec<u64>,
indices: [u64; 2],

What index?

If it is leaf index we can do u32.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants