Support Scheduled GKR by hczphn · Pull Request #199 · PolyhedraZK/ExpanderCompilerCollection

hczphn · 2026-02-03T17:36:20Z

This PR introduces several optimizations to the proving system, improves the scheduling flexibility for MPI processes, and extends the LogUp query capabilities.

Key Changes:

Proving Pipeline Optimization
Simplified Setup: The setup result was previously redundant and unused, effectively causing the system to perform the setup twice. I have modified it to return a default value, significantly reducing setup time.

Lightweight Prove Interface: Optimized the prove interface to eliminate unnecessary dependencies on setup and the full computation graph. The new interface is lightweight and only requires device_memory, improving efficiency and reducing memory overhead.

Scheduled GKR Proving
Dynamic Process Scheduling: Introduced a scheduled GKR proving mechanism.

If a schedule.txt file is provided, MPI processes are allocated and tasks are executed according to the specified order in the file.

If no schedule file is provided, the system falls back to the default execution order.

Location: Implementation can be found in expander_compiler/src/zkcuda/proving_system/expander_no_oversubscribe/prove_impl.rs.

Extended LogUp Support
Hintless LogUp Queries: Added support for LogUp query interfaces that do not include hints, providing more flexibility for different lookup configurations.
MPI & Environment Configuration
Configurable MPI Limits: Users can now set the maximum number of MPI processes via an environment variable, allowing for better resource management in different hardware environments.

Impact:
Performance: Drastic reduction in setup time and memory footprint during the proving phase.

Flexibility: Granular control over how MPI ranks handle GKR tasks via external schedule files.

Usability: Improved configuration through environment variables and more versatile query interfaces.

gemini-code-assist · 2026-02-03T17:36:40Z

Summary of Changes

Hello @hczphn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the proving system by introducing a flexible task scheduling mechanism for MPI processes, optimizing the proving pipeline for better performance and reduced memory footprint, and expanding the capabilities of LogUp queries. These changes aim to provide more control over resource utilization and improve the overall efficiency and adaptability of the system.

Highlights

Proving Pipeline Optimization: The setup process has been streamlined by returning a default value for redundant setup results, significantly reducing setup time. A new lightweight prove interface has been introduced, eliminating unnecessary dependencies on the full computation graph and setup, thereby improving efficiency and reducing memory overhead.
Scheduled GKR Proving: A dynamic GKR proving mechanism has been implemented. If a 'schedule.txt' file is present, MPI processes will execute tasks according to the specified order. Otherwise, the system defaults to its standard execution order. This allows for granular control over task distribution among MPI ranks.
Extended LogUp Support: The LogUp query interfaces now support hintless queries, offering greater flexibility for various lookup configurations within the proving system.
MPI & Environment Configuration: Users can now configure the maximum number of MPI processes using the 'ZKML_NUM_CPUS' environment variable, enabling better resource management. An additional 'ZKML_FORCE_OVERSUBSCRIBE' environment variable has been added for explicit control over MPI oversubscription.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

circuit-std-rs/src/logup.rs
- Added final_check_with_query_count method to LogUpSingleKeyTable and LogUpRangeProofTable to support hintless LogUp queries.
expander_compiler/src/zkcuda/context.rs
- Modified kernel handling during context initialization to retrieve kernels by ID from self.kernels instead of draining cg_kernels.
- Removed the assertion assert!(cg_kernels.is_empty()) to accommodate the new kernel management approach.
expander_compiler/src/zkcuda/proving_system/expander_no_oversubscribe/api_no_oversubscribe.rs
- Introduced prove_lightweight function to ExpanderNoOverSubscribe for proving without requiring the full computation graph or prover setup, optimizing memory usage.
expander_compiler/src/zkcuda/proving_system/expander_no_oversubscribe/prove_impl.rs
- Added mpi_prove_no_oversubscribe_with_schedule function to enable scheduled GKR proving based on an external 'schedule.txt' file.
- Implemented Schedule struct for parsing task schedules and helper functions for finding peers and unique tasks.
- Added functionality to parse task mappings from 'task_mapping.txt' and task dependencies from 'task_dependencies.txt'.
- Introduced file-based synchronization mechanisms (mark_task_completed, wait_for_dependencies) for inter-process communication.
- Developed create_mpi_subgroup_for_task to dynamically create MPI subgroups for specific tasks.
- Implemented MPI result collection logic for BATCH_PCS mode to gather proofs and challenges from all subgroup roots to the global root.
expander_compiler/src/zkcuda/proving_system/expander_parallelized/client_utils.rs
- Updated the setup timer name from 'setup' to 'new setup'.
- Modified MPI size calculation to incorporate the ZKML_NUM_CPUS environment variable for configurable process limits.
- Changed the setup function to return default ExpanderProverSetup and ExpanderVerifierSetup instead of reading from shared memory, simplifying the setup process.
expander_compiler/src/zkcuda/proving_system/expander_parallelized/cmd_utils.rs
- Added support for the ZKML_FORCE_OVERSUBSCRIBE environment variable to explicitly control MPI oversubscription.
- Adjusted MPI oversubscription logic to consider the ZKML_NUM_CPUS environment variable for determining available CPU resources.
expander_compiler/tests/test_bn254_new_data.rs
- Added a new test file to verify the load_computation_graph functionality with new BN254 data, ensuring proper behavior with ExpanderNoOverSubscribe and ZKCudaBN254KZGBatchPCS.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces several significant optimizations and features, including a more efficient proving pipeline, scheduled GKR proving with dynamic MPI process allocation, and extended LogUp support. The changes are well-aligned with the description and aim to improve performance and flexibility.

My review has identified a few areas for improvement. Most critically, there is a memory leak in the new schedule-based execution logic due to the use of Box::leak when creating MPI subgroups. I've also noted some minor issues such as unused variables, opportunities for more robust error handling in file parsing, and a potentially unreachable code block that could be clarified or removed for better maintainability. The addition of a new test case for reusing computation graphs is a great way to validate the optimizations made.

gemini-code-assist · 2026-02-03T17:39:01Z

expander_compiler/src/zkcuda/proving_system/expander_no_oversubscribe/prove_impl.rs

+
+    // Only participants return a valid MPIConfig
+    if my_position_opt.is_some() {
+        let split_comm_static: &'static Option<_> = Box::leak(Box::new(split_comm));


Using Box::leak here will leak memory for every MPI communicator created. Since this function create_mpi_subgroup_for_task is called in a loop for each unique task, it will result in a memory leak for every task. This can become significant for large schedules.

While obtaining a 'static lifetime for the MPI communicator might be necessary due to the mpi crate's API, this approach is unsafe and leads to leaks.

Consider managing the lifetime of the communicators more carefully. Perhaps they can be stored in the mpi_prove_no_oversubscribe_with_schedule function and explicitly dropped when they are no longer needed. If mpi-rs requires 'static, another approach might be to use an arena allocator or a custom pool for communicators that can be cleaned up at the end of the proving process.

gemini-code-assist · 2026-02-03T17:39:01Z

circuit-std-rs/src/logup.rs

+        let mut inputs = vec![builder.constant(self.table.len() as u32)];
+        //append table keys
+        for i in 0..self.table.len() {
+            inputs.push(self.table[i][0]);
+        }
+        //append query keys
+        inputs.extend(self.query_keys.clone());


The inputs variable is initialized here but it's never used. This block of code appears to be a leftover from the final_check function and can be safely removed.

gemini-code-assist · 2026-02-03T17:39:01Z

circuit-std-rs/src/logup.rs

+        query_count: &[Variable],
+    ) {
+        let alpha = builder.get_random_value();
+        let inputs = self.query_keys.clone();


The inputs variable is initialized here but it's never used and can be removed.

gemini-code-assist · 2026-02-03T17:39:01Z

expander_compiler/src/zkcuda/proving_system/expander_no_oversubscribe/prove_impl.rs

+
+        let parts: Vec<&str> = line.split(':').collect();
+        if parts.len() != 2 {
+            continue;


Silently ignoring malformed lines in the task mapping file by using continue can make debugging difficult if the file is incorrect. It would be more robust to return an error for invalid lines.

Suggested change

continue;

return Err(format!("Invalid line format: {}", line));

gemini-code-assist · 2026-02-03T17:39:01Z

expander_compiler/src/zkcuda/proving_system/expander_no_oversubscribe/prove_impl.rs

+
+        let parts: Vec<&str> = line.split(':').collect();
+        if parts.len() != 2 {
+            continue;


Silently ignoring malformed lines in the task dependencies file by using continue can make debugging difficult if the file is incorrect. It would be more robust to return an error for invalid lines.

Suggested change

continue;

return Err(format!("Invalid line format: {}", line));

gemini-code-assist · 2026-02-03T17:39:01Z

expander_compiler/src/zkcuda/proving_system/expander_no_oversubscribe/prove_impl.rs

+        } else {
+            eprintln!(
+                "[RANK {}] Executing task {} solo (creating singl
+       e-rank MPI config)",
+                my_rank, task_name
+            );
+
+            // Create a single-rank MPI config for solo tasks
+            // We use the pre-created config from task_mpi_configs
+            let solo_config = task_mpi_configs.get(task_name).and_then(|c| c.as_ref());
+
+            if let Some(config) = solo_config {
+                prove_kernel_gkr_no_oversubscribe::<
+                    GetFieldConfig<ZC>,
+                    GetTranscript<ZC>,
+                    ZC::ECCConfig,
+                >(
+                    config,
+                    &computation_graph.kernels()[template.kernel_id()],
+                    &commitment_values,
+                    next_power_of_two(template.parallel_count()),
+                    template.is_broadcast(),
+                    n_bytes_profiler,
+                )
+            } else {
+                // Fallback: use global MPI config for solo task
+                prove_kernel_gkr_no_oversubscribe::<
+                    GetFieldConfig<ZC>,
+                    GetTranscript<ZC>,
+                    ZC::ECCConfig,
+                >(
+                    global_mpi_config,
+                    &computation_graph.kernels()[template.kernel_id()],
+                    &commitment_values,
+                    next_power_of_two(template.parallel_count()),
+                    template.is_broadcast(),
+                    n_bytes_profiler,
+                )
+            }
+        };


This else block for handling "solo" tasks seems confusing. The i_am_participant check at line 1015 should ensure that this code is only run by ranks participating in the current task. For a participant, local_mpi_config (retrieved from task_mpi_configs) should be Some(MPIConfig).

This else branch, which is taken when local_mpi_config is None, appears to be unreachable for a participating rank. If it is reachable, it suggests a potential logic issue in how MPI subgroups are created or tracked.

Could you clarify the conditions under which a participating rank would have a None local_mpi_config? If this branch is indeed unreachable, it should be removed to improve clarity.

gemini-code-assist · 2026-02-03T17:39:01Z

expander_compiler/src/zkcuda/proving_system/expander_no_oversubscribe/prove_impl.rs

+            eprintln!(
+                "[RANK {}] Executing task {} solo (creating singl
+       e-rank MPI config)",
+                my_rank, task_name
+            );


There's a typo in this log message. single-rank is split across two lines.

Suggested change

eprintln!(

"[RANK {}] Executing task {} solo (creating singl

e-rank MPI config)",

my_rank, task_name

);

eprintln!(

"[RANK {}] Executing task {} solo (creating single-rank MPI config)",

my_rank, task_name

);

hczphn added 11 commits January 18, 2026 05:32

fix bugs to reuse graph

18818f3

add schedule prove version, pass test

3d69ff8

remove debug-level print info

c1fe2a7

remove duplicate schedule

9a12c27

reduce eprintln

c89adbb

support no hint logup

d91d499

lightweight prove, reduce memory cost after sending prove request

4fee136

fix fmt, warning

ca286e5

fix format

736a6e2

Merge branch 'hc/reuse_graph' into hc/schedule

1be709d

remove unuse variable

2a68eb1

gemini-code-assist bot reviewed Feb 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Scheduled GKR#199

Support Scheduled GKR#199
hczphn wants to merge 11 commits intomasterfrom
hc/schedule

hczphn commented Feb 3, 2026

Uh oh!

gemini-code-assist bot commented Feb 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Uh oh!

gemini-code-assist bot Feb 3, 2026

Uh oh!

gemini-code-assist bot Feb 3, 2026

Uh oh!

gemini-code-assist bot Feb 3, 2026

Uh oh!

gemini-code-assist bot Feb 3, 2026

Uh oh!

gemini-code-assist bot Feb 3, 2026

Uh oh!

gemini-code-assist bot Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	continue;
	return Err(format!("Invalid line format: {}", line));

Conversation

hczphn commented Feb 3, 2026

Uh oh!

gemini-code-assist bot commented Feb 3, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant