Skip to content

Support Scheduled GKR#199

Open
hczphn wants to merge 11 commits intomasterfrom
hc/schedule
Open

Support Scheduled GKR#199
hczphn wants to merge 11 commits intomasterfrom
hc/schedule

Conversation

@hczphn
Copy link
Collaborator

@hczphn hczphn commented Feb 3, 2026

This PR introduces several optimizations to the proving system, improves the scheduling flexibility for MPI processes, and extends the LogUp query capabilities.

Key Changes:

  1. Proving Pipeline Optimization
    Simplified Setup: The setup result was previously redundant and unused, effectively causing the system to perform the setup twice. I have modified it to return a default value, significantly reducing setup time.

Lightweight Prove Interface: Optimized the prove interface to eliminate unnecessary dependencies on setup and the full computation graph. The new interface is lightweight and only requires device_memory, improving efficiency and reducing memory overhead.

  1. Scheduled GKR Proving
    Dynamic Process Scheduling: Introduced a scheduled GKR proving mechanism.

If a schedule.txt file is provided, MPI processes are allocated and tasks are executed according to the specified order in the file.

If no schedule file is provided, the system falls back to the default execution order.

Location: Implementation can be found in expander_compiler/src/zkcuda/proving_system/expander_no_oversubscribe/prove_impl.rs.

  1. Extended LogUp Support
    Hintless LogUp Queries: Added support for LogUp query interfaces that do not include hints, providing more flexibility for different lookup configurations.

  2. MPI & Environment Configuration
    Configurable MPI Limits: Users can now set the maximum number of MPI processes via an environment variable, allowing for better resource management in different hardware environments.

Impact:
Performance: Drastic reduction in setup time and memory footprint during the proving phase.

Flexibility: Granular control over how MPI ranks handle GKR tasks via external schedule files.

Usability: Improved configuration through environment variables and more versatile query interfaces.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @hczphn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the proving system by introducing a flexible task scheduling mechanism for MPI processes, optimizing the proving pipeline for better performance and reduced memory footprint, and expanding the capabilities of LogUp queries. These changes aim to provide more control over resource utilization and improve the overall efficiency and adaptability of the system.

Highlights

  • Proving Pipeline Optimization: The setup process has been streamlined by returning a default value for redundant setup results, significantly reducing setup time. A new lightweight prove interface has been introduced, eliminating unnecessary dependencies on the full computation graph and setup, thereby improving efficiency and reducing memory overhead.
  • Scheduled GKR Proving: A dynamic GKR proving mechanism has been implemented. If a 'schedule.txt' file is present, MPI processes will execute tasks according to the specified order. Otherwise, the system defaults to its standard execution order. This allows for granular control over task distribution among MPI ranks.
  • Extended LogUp Support: The LogUp query interfaces now support hintless queries, offering greater flexibility for various lookup configurations within the proving system.
  • MPI & Environment Configuration: Users can now configure the maximum number of MPI processes using the 'ZKML_NUM_CPUS' environment variable, enabling better resource management. An additional 'ZKML_FORCE_OVERSUBSCRIBE' environment variable has been added for explicit control over MPI oversubscription.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • circuit-std-rs/src/logup.rs
    • Added final_check_with_query_count method to LogUpSingleKeyTable and LogUpRangeProofTable to support hintless LogUp queries.
  • expander_compiler/src/zkcuda/context.rs
    • Modified kernel handling during context initialization to retrieve kernels by ID from self.kernels instead of draining cg_kernels.
    • Removed the assertion assert!(cg_kernels.is_empty()) to accommodate the new kernel management approach.
  • expander_compiler/src/zkcuda/proving_system/expander_no_oversubscribe/api_no_oversubscribe.rs
    • Introduced prove_lightweight function to ExpanderNoOverSubscribe for proving without requiring the full computation graph or prover setup, optimizing memory usage.
  • expander_compiler/src/zkcuda/proving_system/expander_no_oversubscribe/prove_impl.rs
    • Added mpi_prove_no_oversubscribe_with_schedule function to enable scheduled GKR proving based on an external 'schedule.txt' file.
    • Implemented Schedule struct for parsing task schedules and helper functions for finding peers and unique tasks.
    • Added functionality to parse task mappings from 'task_mapping.txt' and task dependencies from 'task_dependencies.txt'.
    • Introduced file-based synchronization mechanisms (mark_task_completed, wait_for_dependencies) for inter-process communication.
    • Developed create_mpi_subgroup_for_task to dynamically create MPI subgroups for specific tasks.
    • Implemented MPI result collection logic for BATCH_PCS mode to gather proofs and challenges from all subgroup roots to the global root.
  • expander_compiler/src/zkcuda/proving_system/expander_parallelized/client_utils.rs
    • Updated the setup timer name from 'setup' to 'new setup'.
    • Modified MPI size calculation to incorporate the ZKML_NUM_CPUS environment variable for configurable process limits.
    • Changed the setup function to return default ExpanderProverSetup and ExpanderVerifierSetup instead of reading from shared memory, simplifying the setup process.
  • expander_compiler/src/zkcuda/proving_system/expander_parallelized/cmd_utils.rs
    • Added support for the ZKML_FORCE_OVERSUBSCRIBE environment variable to explicitly control MPI oversubscription.
    • Adjusted MPI oversubscription logic to consider the ZKML_NUM_CPUS environment variable for determining available CPU resources.
  • expander_compiler/tests/test_bn254_new_data.rs
    • Added a new test file to verify the load_computation_graph functionality with new BN254 data, ensuring proper behavior with ExpanderNoOverSubscribe and ZKCudaBN254KZGBatchPCS.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several significant optimizations and features, including a more efficient proving pipeline, scheduled GKR proving with dynamic MPI process allocation, and extended LogUp support. The changes are well-aligned with the description and aim to improve performance and flexibility.

My review has identified a few areas for improvement. Most critically, there is a memory leak in the new schedule-based execution logic due to the use of Box::leak when creating MPI subgroups. I've also noted some minor issues such as unused variables, opportunities for more robust error handling in file parsing, and a potentially unreachable code block that could be clarified or removed for better maintainability. The addition of a new test case for reusing computation graphs is a great way to validate the optimizations made.


// Only participants return a valid MPIConfig
if my_position_opt.is_some() {
let split_comm_static: &'static Option<_> = Box::leak(Box::new(split_comm));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using Box::leak here will leak memory for every MPI communicator created. Since this function create_mpi_subgroup_for_task is called in a loop for each unique task, it will result in a memory leak for every task. This can become significant for large schedules.

While obtaining a 'static lifetime for the MPI communicator might be necessary due to the mpi crate's API, this approach is unsafe and leads to leaks.

Consider managing the lifetime of the communicators more carefully. Perhaps they can be stored in the mpi_prove_no_oversubscribe_with_schedule function and explicitly dropped when they are no longer needed. If mpi-rs requires 'static, another approach might be to use an arena allocator or a custom pool for communicators that can be cleaned up at the end of the proving process.

Comment on lines +346 to +352
let mut inputs = vec![builder.constant(self.table.len() as u32)];
//append table keys
for i in 0..self.table.len() {
inputs.push(self.table[i][0]);
}
//append query keys
inputs.extend(self.query_keys.clone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The inputs variable is initialized here but it's never used. This block of code appears to be a leftover from the final_check function and can be safely removed.

query_count: &[Variable],
) {
let alpha = builder.get_random_value();
let inputs = self.query_keys.clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The inputs variable is initialized here but it's never used and can be removed.


let parts: Vec<&str> = line.split(':').collect();
if parts.len() != 2 {
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Silently ignoring malformed lines in the task mapping file by using continue can make debugging difficult if the file is incorrect. It would be more robust to return an error for invalid lines.

Suggested change
continue;
return Err(format!("Invalid line format: {}", line));


let parts: Vec<&str> = line.split(':').collect();
if parts.len() != 2 {
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Silently ignoring malformed lines in the task dependencies file by using continue can make debugging difficult if the file is incorrect. It would be more robust to return an error for invalid lines.

Suggested change
continue;
return Err(format!("Invalid line format: {}", line));

Comment on lines +1070 to +1109
} else {
eprintln!(
"[RANK {}] Executing task {} solo (creating singl
e-rank MPI config)",
my_rank, task_name
);

// Create a single-rank MPI config for solo tasks
// We use the pre-created config from task_mpi_configs
let solo_config = task_mpi_configs.get(task_name).and_then(|c| c.as_ref());

if let Some(config) = solo_config {
prove_kernel_gkr_no_oversubscribe::<
GetFieldConfig<ZC>,
GetTranscript<ZC>,
ZC::ECCConfig,
>(
config,
&computation_graph.kernels()[template.kernel_id()],
&commitment_values,
next_power_of_two(template.parallel_count()),
template.is_broadcast(),
n_bytes_profiler,
)
} else {
// Fallback: use global MPI config for solo task
prove_kernel_gkr_no_oversubscribe::<
GetFieldConfig<ZC>,
GetTranscript<ZC>,
ZC::ECCConfig,
>(
global_mpi_config,
&computation_graph.kernels()[template.kernel_id()],
&commitment_values,
next_power_of_two(template.parallel_count()),
template.is_broadcast(),
n_bytes_profiler,
)
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This else block for handling "solo" tasks seems confusing. The i_am_participant check at line 1015 should ensure that this code is only run by ranks participating in the current task. For a participant, local_mpi_config (retrieved from task_mpi_configs) should be Some(MPIConfig).

This else branch, which is taken when local_mpi_config is None, appears to be unreachable for a participating rank. If it is reachable, it suggests a potential logic issue in how MPI subgroups are created or tracked.

Could you clarify the conditions under which a participating rank would have a None local_mpi_config? If this branch is indeed unreachable, it should be removed to improve clarity.

Comment on lines +1071 to +1075
eprintln!(
"[RANK {}] Executing task {} solo (creating singl
e-rank MPI config)",
my_rank, task_name
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in this log message. single-rank is split across two lines.

Suggested change
eprintln!(
"[RANK {}] Executing task {} solo (creating singl
e-rank MPI config)",
my_rank, task_name
);
eprintln!(
"[RANK {}] Executing task {} solo (creating single-rank MPI config)",
my_rank, task_name
);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant